Historic,  Archive  Document 

Do  not  assume  content  reflects  current 
scientific  knowledge,  policies,  or  practices. 


Reserve 

aTX353 

.Z3 


FOOD  CONSUMPTION  AND  EXPENDITURE  DATA 


"UR5 
S3RARY 


United  States  Department  of  Agriculture 
Office  of  International  Cooperation  and  Developnnent 


AD-33  Bookplate 

(1-98) 


NATIONAL 


LIBRARY 


SAMPLING  METHUDOLOGIES  FOR  MORE  COST-EFFECTIVE     COLLECTION  OF 
FOOD  CONSUMPTION  AND  EXPENDITURE  DATA 


FEB  (  !  IQ9I 


CATALOGiU'G  PRfcp 


fc!y : 

Torn   Zai  i  a 

Aqri  cultural  EconomiE-t 
Apr 1 1 .    1 9SS 


Report  prepared  tor  the  Nutrition  Economics  Group,  UlUD,  United 
States  E>epartmerit   of   Agriculture,    Washington  D.C. 


SAMPLING  METHODOLOGIES  FOR  MORE 
FOOD  CONSUMPTION  AND 


COST-EFFECTIVE  COLLECTION  OF 
EXPENDITURE  DATA 


951035 


Page 


I.  INTRODUCTION  1 

II.  THE  RAPID  APPRAISAL  PERSPECTIVE  AS  APPLIED  TO 
CONSUMPTION  AND  EXPENDITUE  SURVEYS  2 

III.  SAMPLING  METHODOLOGIES  USED   IN  OTHER 
CONSUMPTION/EXPENDITURE  SURVEYS  3 

A.  Madagascar   Rice  Demand   Study  3 

B.  Liberia  Urban  Food  ConE-umpt  i  on  Survey  4 

C.  Haiti  Household  Expenditure  And  Consumption  Survey  6 
E>.    Kilimanjaro  Food  Consumption  Survey  S 

IV.  LESSONS  FOR  DESIGNING  COST-EFFECTIVE  SAMPLING 
METHODOLOGIES  10 

V.  THE  NEG  RAPID  APPRAISAL  WORKSHOP  16 

VI.  RECOMMENDED  SAMPLING  METHODOLOGY  FOR  NEG  SURVEYS  19 

A.  Sampling   Frame  and   Sampling  Procedures  19 

B.  Weighting  Procedures  23 

C.  Statistical    Inference                           i?  27 


VII.    CONCLUSION  AND  RECOMMENDATIONS 


BIBLIOGRAPHY 

ANNEX  A:    Variance  Formulas  For   A  Stratified,  iiulti- 

Stage  Sample  With  Selection  Probabilities 

Proportional    To  A  Measure  Of  Size 


ANNEX  B:  GLOSSARY 


1.  INTRODUCTION 


The  Nutrition  Elconomi  c:  s  Group  (NEG)  of  the  United  E>tatess  De- 
p  a  r  t  <n  e  r  1 1  o  t  A  q  r  i  c  u  1 1.  u  v  e  a  s  5  i  s  t.  b  U  S  A I  E>  hi  i  e-  s  i  o  n  <=■  t  h  r  o  u  g  h  o  u  t  t  h  e 
tsi  c:<  r  1  (d  in  p  r  e  p  a  r  i  n  q  ,  e  e  <i  u  t.  i  n  g  a  n  d  anal  y  :•:  i  n  g  e-  u  r  v  e  y  b  o  f  -f  o  o  d  c  o  r  i  -- 
Bumption  ajid  expenditures.  Consumption  and  expenditure  surveys 
provide  data  essential  tor  detailed  analysis^-  of  a  wide  range  of 
policies  relating  to  t  ood  ,  n u t r  1 1  i  on  ,  p r  i  c; e s  and  i  n c orne s .  The 
general  thrust  of  NE:G  s  work  has  been  to  f  i  nance  i  nnovat  i  ve ,  cost- 
effective  methodologies  that  can  significantly  reduce  the  cost  and 
tTTIfie'  required  to  gather  the  kind  of  data  needed  for  policy 
analysisa  The  first  stage  of  this  effort  has  been  to  draw  upon 
the  efforts  of  a  number  of  consultants  experienced  in  various  as- 
pects of  food  consumption  and  expenditure  analysis,  each  using  a 
more  or   less  unique  methodology  in   a  particular  country- 

In  an  attempt  to  capitalize  on  the  experience  of  these  in- 
novative efforts  by  individuals^  with  different  perspectives,  the 
WEG  sponsored  a  workshop  on  Rapi  d  Appr ai  sal  Techni  ques  For  Food 
Pol i  c V  Anal ysi  s  in  April,  1987.  The  purpose  of  the  workshop  was 
to  explore  how  various  sampling  and  data  collection  methods  might 
reduce  the  amount  of  data  collected  and  speed  their  release,  while 
retaining  their  usefulness  for  rigorous  quantitative  analysis. 
Following  the  workshop  NEG  commi  ssi  one^-d  more  in-depth  studies  of 
some  of  the  key  proposals  made-  This  report  is  one  of  those  stud- 
i  es. 


This  report  reviews  sampling  methodologies  used  in  several 
consumption  surveys  with  rapid  appraisal  objectives.  It  also  sum- 
marizes some  data  collection  and  sampling  methodology  issues 
raised  at  the  workshop.  It  proposes  an  approach  to  sampling  for 
consumption  surveys  that  is  widely  understood  and  which  admits  of 
fairly  rapid  application,  yet  which  can  accommodate  the  most  rig- 
orous of  conventional  techni ques „  It  provides  formulas  for  esti- 
mating variances  that,  though  slightly  biased  in  some  instances, 
are  simple  to  use  where  computers  or  computer  programming  skills 
are  not   available.  The  approach   throughout   is  on  realistic  as~ 


sesii-ment   of    all    typeE-  of    potf^^nt  i.  r^l    error    in   obtaining     coriELimpt. :!.  on 

and   expendi  tur  &  dat.  a„  The  objective   is   to  make  the  shortcuts  Bug-- 

gested   more  palatable  to  those  who   are  more  disciplinary   in  their 
a p p r o a c h  to  sampling.: 

II.    THE  RAPID  APPRAISAL  PERSPECTIVE  AS  APPLIED  TO 
CONSUMPTION  AND  EXPENDITURE  SURVEYS 

One  conclusion  that  emerged  from  the  Rapid  Appraisal  Workshop 
is  that  there  are  significant  limits  on  the  number  of  shortcuts 
one  can  take  and  still  obtain  data  of  sufficient  quality  for  quan- 
titative food  policy  analysis.  The  keynote  paper  for  the  workshop 
(Zaiia,  1987)  contained  a  re?view  of  several  non-formal  and  more 
qualitative  techniques  for  gathering  food  consumption  and  expendi- 
ture data.  Such  techniques  can  frequently  yield  very  good  de- 
E;-cr iptive  data^  They  also  can  generate  an  understanding  of 
consumption/expenditure  systems  that  is  not  easily  obtained  from 
questionnaire  surveys - 

Such  techniques,  however,  are  of  limited  use  where  quantita- 
tive analysis  of  food  consumption  interactions  is  required.  Sea- 
sonal production  and  consumption  patterns,  the  presence  of  lumpy 
income,  consumption  and  expenditure  events  and  the  need  for  rea- 
sonably precise  measures  of  consumption  and  expenditures  for  many 
types  of  analyses  limit  the  amount  of  detailed  data  collection  one 
can  dispense  with,  except  in  very  narrowly  defined  situations. 
For  this  reason  we  use  the  term  cost-effective  rather  than  rapid 
appraisal  throughout  this  report.  r'4onethel  ess ,  there  is  consider- 
able scope  for  improving  the  turnaround  time  of  consumption  and 
expenditure  data,  while,  at  the  same  time,  reducing  the  cost  of 
acquiring  them.  The  sampling  and  data  collection  methodologies 
described  below  are  directed   to  these  ends. 


I 


[li,    SAMPLING  METHODOLOGIES  USED   IN  OTHER 
CONSUMPT I ON/E XPEND I TORE  SURVEYS 


AiRD  (Ahlers,  .1983)  asE-iBted  the  hinistrv  of  Aqr  j.  cul  tur  e  of 
M^sdaqaBcar  in  conducting  a  survey  of  rice  conBumption  in  the 
c  oun  t  r  y ' s  seven  r  eg i on  a 1  capitals  in  1 9S4 .  Us  i  n  g  M i n i s  t  r  y  of  Ag - 
riculture  statistical  and  surve:-/  staff,  survey  organizers  utilized 
a  stratified,  two-stage  sample  of  2S0  households  in  Antanarivo  and 
SO  households  in  each  of  the  six  regional  capitals.  Each  city  in 
the  sample  was  treated  as  a  separate  stratum-  Antanarivo,  the 
capital,  was  further  stratified  into  income  groups  by  means  of 
housing  type. 

Local  level  administrative  units,  the  fokotany,  served  as  the 
primary  sampling  units  (psu)  at  the  first  stage.  The  population  of 
each  fokotany  was  readily  available  from  administrative  sources. 
A  cumulative  listing  of  all  households  in  each  fokotany  within  a 
stratum  was  used  to  draw  a  systematic  sample  of  fokotany  with 
probabilities  proportional  to  estimated  size.  Ten  fokotany  were 
selected  in  each  of  the  smaller  centers  and  2S  were  selected  in 
Antanar  i  vo . 

A  list  of  the  names  of  the  heads  of  household  living  in  each 
of  the  selected  fokotany  served  as  the  second-stage  sampling 
•frame.      As  at   the  first,   stage,    a   systematic   sampling   procedure  was 

/used   to  ensure   a   representative  distribution   of    the   sample  units. 

{^Sample  size  was  constant   for   each   fokotany  within   a  qiven  stratum. 

A  systematic  sampling  procedure  using  a  cumulative?  population 
listing  amounts  to  sampling  with  probability  proportional  to  a 
measure  of  size.  Each  fokotany  has  a  probability  of  being  se- 
lected that  is  equal  to  its  population  relative  to  the  population 
of  all  fokotany  within  the  stratum.  At  the  second  stage  each 
household  has  a  probability  of  selection  equal  to  the  size  of 
sample  drawn  from  the  fokotany  divided  by  its  household  popula- 
tion. When  used  in  conjunction  with  a  sample  of  constant  size 
within   each   selected   primary   sampling   unit,      this  procedure  gener- 


a i e !=•  B.  E»el  f  -- w h t.  i  n q  e  a rri  1:5  ].  e  -  T  I": a cj  v  a  1  "1 1.  a q q  o  t  *  e J.  f  - w e  1  ci h  t  1.  ri «g 
sanipie  is  that  it  can  be  treated  as  a  'iii-impie  randcxn  Eample  'ror  es- 
t  i  I'Ti  a  t  i  n  q  p  <;3  n  1  a.  t  i  c:j  n  at  e  a  n  ^r, .  F  h  1  e  c:  a  n  i:::<  e  i  m  p  o  r  t  a  n  t  where  computers 
are  not  a/ai  i  ab  i  e  h  where  circumstances  dci  not  permit  qu.  ici--:  calcu- 
lation  of    v a r  i  a n r. e 5  e s 1 1  m a t   < j   b y  c o m p  1  e >;    t  c:- r  n i u  1  a  . 

The  Madagascar  survey  consisted  of  a  single  visit  to  each  se- 
lected household  over  one  six -week  period  of  the  year.  Since  the 
sample  population  was  an  urban  population,  researchers  felt  they 
could  obtain  an  adequate  representation  of  average  annual  house- 
hold  consumption   with   such   a  methodology » 

Li  ber  i  a  Urban   Food  Con sump t i  on  Survey 

In  conjunction  with  the  Nutrition  Economics  Group  of  USE'A 
Purdue  University  conducted  a  study  of  urban  food  consumption  pat- 
terns in  Liberia  in  1986  (Heimstra,  1987).  The  study  was  limited 
to  Monrovia,  several  other  major  urban  centers  and  a  representa- 
tive number  of  smaller  urban  centers  in  rural  areas  with  heavy 
concentrations  of  rice  production.  In  total,  seven  urban  areas 
were  studied. 

The  sampling  methodology  utilized  a  type  of  /area  sampling. 
Each  town  was  divided  into  three  (six  in  Monrovia)  sectors  of 
roughly  equal  household  population-  These  sectors  were  essen- 
tially geographically  defined  substrata  within  the  particular  ur- 
ban center.  Each  sec tor / substr atum  was  assigned  to  one 
enumerator.  Enumerators  prepared  a  sketch  showing  all  streets  and 
major  paths  in  their  assigned  sector.  They  then  proceeded  to  sub- 
divide the  sector  into  sub -areas  of  approximately  100  structures. 
These  sub-areas  served   as  the  primary   sample  units. 

To  draw  the  sample  enumerators  counted  the  number  of  struc- 
tures in  each  sub-area  and  prepared  a  cumulative  listing  of  struc- 
tures. A  number  between  one  and  the  total  number  ot  structures  in 
the  sector /substr atum  was  selected  from  a  random  number  table  and 
the  sub-area  containing  that  structure  became  the  selected 
first-stage  sample  unit.        There  was,      thus,      only  one  first-stage 

-4- 


LI  n  i  t      s  e  1  e  c  t  e  d     pi^r   e  t..»  b  &  i..  r  a  t  li  .ti  „        F.  a  c !"(   c  :i.  t  y  w  a  e  ,      in     a  cl  d  1 1  i  o  n  ,  a 
B  f  .'  a  r  a  t  e   s  t  r  a  t  i  .i  m  . 

Once  the  Eu.b-ar  eaB/ pr' i  mar  y  sample  urnts  were  selected,  enu- 
merators listed  all  of  the  structures  within  the  sub-area«  This 
list  oi  structures  became  the  sampling  frame  at  the  second  stage 
and  the  structures  themselves  were  the  second-stage  sample  units. 
A  random  sample  of  37  structures  (46  in  Monrovia)  was  selected  -for 
interviewing.  This  represented  roughly  a  1/3  (plus  10"/»  reserve) 
sample  of   the  structures  within   the   selected   sub— areas - 

For  the  purposes  of  this  study,  a  structure  was  defined  as 
one  being  lived  in  by  private  individuals-  Apartment  units  that 
appeare/d  to  be  independent  housing  units  were  defined  as  separate 
structures  even  though  they  were  physically  connected.  Within 
each  chosen  structure  enumerators  selected  a  single  household  to 
be  interviewed.  Enumerators  selected  households  by  means  of  a 
predetermined  alphabetical  selection  based  on  the  first  names  of 
the  household  heads.  Only  households  defined  as  a  "housekeeping 
household",  i.e.  one  that  prepared  at  home  at  least  half  of  its 
meals  for  own  consumption  during  the  survey  period,  were  included 
in   the  sample. 

Since  the  second-stage  sample  unit,  the  structure,  had  an  un- 
known number  of  households,  the  e>;  -ante  extrapolation  weight  was 
multiplied  by  the  number  of  households  found  in  the  structure  at 
the  time  of  the  interview.  In  effect,  the  interviewed  household 
was  assumed  to  be  representative  of  all  households  within  the 
structure.  This  had  the  effect  of  increasing  the  representation 
of  smaller  households  in  proportion  to  their  incidence  in  the 
population  of  the  sub-area.  This  was  necessary  in  order  to  obtain 
representative  household-level  estimates  at  the  level  of  the 
sub-area.  For  the  purposes  of  making  mean  estimates  for  house- 
holds within  an  urban  area,  households  from  the  three  (six  in 
Monrovia)  sub-areas  in  each  urban  center  were  combined  as  though 
they  were  drawn  from  a  weighted,  simple  random  sample  .  Strata 
weights  assigned   for   making  overall    population  estimates  then  re- 


r 


fleeted  onl  y  t  he  si  ze  of  ^:he  i  n d i.  v  i.  d u ]  urban  centc^rs  as  a  propor-- 
1 3.  o  n   (::<  t    t  h  e  t.  o  t  a  1    p  cj  pi  u  J  a  1 1  cj  fi   o  t    a.  ].  1    t  h  e  c:  e  n  t  -a  r      c  o  m ta  i  n  e  d  . 

"r  h  e     5  (ft  a  J.  1      »••  I  u  (ft  b  e  r   a  f    t  i  r  'iH- 1  e-  t,  a  q  e  e  a  m  p  l  e     u.  ri  1 1  b     p  r  e  b  e  n  t  e  c!  a 

p  r  c  j  b  J.  e  ni  in  t  h  e  Liberia  e  t  i.i  y  .  U  r  b  a  n  hi  o  s  3.  n  g  p  a  1. 1  e  i-  n  e  tend  t  o  r  e  - 
fleet  concentrations  of  Eimilar  EtructureE-  and  hou.Eehold  economic 
E.tatu5.  Since     each      sector      was     already     being     divided  into 

sub -areas  of  100  structures,  it  seernE  that  it  would  have  been 
relatively  easy  to  sample  one  third  to  one  fc<urth  as  many  struc- 
tures in  three  or  four  separate  sub-areas  rather  than  concentrat- 
ing the  s  a  m  p  1  e   i      o  n  1  y  o  ri  e . 

Hai  t  i    !:loy.g:ehi.9.i.i.l  iJ2iP.S:.!l^li."]ii::.\r_§!.  tlDJl  Con  sump  t  i  on  Survey 

An  expenditure  survey  being  conducted  in  Haiti  during  19S6-S7 
utilized  several  methodological  variants  that  offer  promise  wit.h 
respect  to  rapid  appraisal.  The  survey  utilized  a  stratified 
two-stage  design  to  select  the  household  units  to  be  interviewed. 
Unlike  the  other  tvio  surveys,  the  Haiti  survey  included  both  urban 
and  rural  areas. 


To  accommodate  the  variation  in  consumption  patterns  between 
urban  and  rural  areas,  and  between  geographically  distinct  rural 
areas  in  the  country,  the  survey  design  divided  the  country  into 
nine  relatively  homogeneous  strata.  This  first  level  of  stratifi- 
cation corresponded  to  the  major  geographical  domains:  urban  and 
rural  areas  for  each  of  the  four  planning  regions  of  the  country, 
plus  a  separate  stratum  for   metropolitan  Port-au-Prince. 

Census  E?numeration  areas  (sections  d  '  enumer  at  i  on  >  ,  were  used 
as  the  primary  sampling  units.  All  enume?ration  areas  (EA)  were 
delineated  on  census  sketch  maps.  Their  boundaries  normally  fol- 
low defined  topographical  characteristics  so  that  they  can  be 
identified  on  the  ground.  Each  EA  was  classified  as  urban,  rural 
or  peri -urban.  Since  the  peri -urban  areas  are  generally  more  ru- 
ral in  nature,  these  were  included  in  the  rural  strata.  In  addi- 
tion, the  urban  geographic  strata  were  subdivided  into  three 
economic   substrata  reflecting  high,      medium  and   low  levels  of  in- 


-6- 


c  G  m  e .  The  r  u  r  a  1  s  t  r  a  t;.  a  w  e  r  e  'd:.  b  d  i  •/ 1  r J  e  cJ  i  n  t  a  p  ]  a  i  n  s  a  ri  (1  ni  c:<  (.  i  r»  h  a  :i.  n 
BUDEtrata-  Each  EA  waE  ail  oca  ted  to  that".  econcDmic  Eubst  r  at.  u.m 
which   appeared   to  coritain   most   ot    i  t.E  population. 

Once  EAs  were  claEsified  and  placed  into  the  appropriate  sub-- 
stratu.Ti  they  were  listed  in  serpentine  fa.E-hion  in  order  to  provide 
implicit     E-tr  at  i  t  i  cat  i  on  -        Their   household  populations     from  the 

preliminary     census     tabulations  were  used  as  a  measure     of  size. 
These     measures  of   size  were  cumulated  within   a     substratum.  The 
total    number   of   households  in  the  substratum,      divided  by  the  de- 
sired number   of   EAs  to  be  sampled,   defined   the  systematic  sampling 
interval      used   to  select   the  specific   EAs.  Selection  of  house- 

holds proceeded  in  similar  fashion  with  10  households  sampled  per 
EA„  Since  a  constant  size  sample  of  10  households  per  EA  was  se- 
lected, the  sample  was  approximately  sel f -wei ght i nq .  In  addition, 
since  the  sample  within  each  stratum  was  allocated  proportionately 
among  the  substrata,  the  sample  was  also  approximately 
sel f -wei ght i ng  within   a  stratum. 

Because  policy  makers  wanted  more  or  less  equally  reliable 
estimates  from  each  major  geographic  domain,  the  overall  sample 
was  not  allocated  in  proportion  to  the  populations  of  the  strata. 
This  prevented  the  sample  from  being  completely  sel f -wei ght i ng . 
The  Haiti  survey  allocated  approximately  2/3  of  the  sample  to  the 
urban  strata  based  on  anticipated  variability.  This  was  expected 
to  generate  more  efficient  estimates  than  would  have  been  likely 
with  a  completely  sel f -wei ght ing  sample,  assuming  the  variance  as- 
sumptions were  correct. 

Actual  data  collection  for  the  Haiti  survey  was  to  be  spread 
over  a  52  week  period.  The  selected  EAs  were  divided  into  13 
subsamples,  each  randomly  assigned  to  one  four-week  period  of  the 
year.  E'ata  was  collected  from  selected  households  only  once. 
Rotation  of  the  sample  over  the  year  provided  the  necessary  sea- 
sonal  dimension  to  average  consumption  patterns. 


l^i  h  (•:-?  n  -firs- 1.  -  b  t  a  q  e  ~  ample  >..-  n  1 1.  s  a  r  e?  b  g  1  e  c  t  e  ci  n  s  i ;  <  g  n  a  n  ~-  c:  u  r  r  e  n  t 
p  o p  Li  i  a  i:  I  Ci n  1 1  q  u.  r  e  b  ,  a  s  i  e  o  i  t  e  n  I'  h  e  <-  a  b  e  i--,*  h t  e  i-i  u.  s  i  n  q  r.:.  e  n  b  n  b  e  n  u  - 
meration  areaB  a.B  psiUB,  the  number  of  houB-eholdE-  ioand  at.  the  time 
of  liBtinq  will  o-f ten  be  different  from  that  attributed  to  the  pBU 
at  the  time  of  samplinq.  The  individual  houE-ehold  weightB  need  to 
be  adjusted  to  reflect  this  difference.  They  also  must  be  ad- 
justed to  reflect  the  presenc-.e  of  multiple  households  not 
E-eparateiy  listed,  consolidated  households  that  were  listed 
separately,  as  well  as  unoccupied  and  other  ineligible  sample 
housing  units.  The  Haiti  survey  adjusted  weights  -for  all  of  these 
•factors.  Non -response  adjustments  were  also  necessary  in  certain 
EAs  where  fewer  than  10  household  interviews  were  achieved  after 
ail  substitutes  were  exhausted.  The  final  household  weights  then 
became  a  permanent  part  of  the  data  record  for  each  household. 
This  did  not  present  a  computational  problem  since  data  are  being 
analyzed  by  computer, 

D.    Ki 1 i  man jar Q  Food  Con sump t  i  on  Survey 

The  author  carried  out  a  study  of  food  consumption  in  rural 
Kilimanjaro  District  of  Tanzania  in  1973-4.  The  study  utilized  a 
24  hour  recall  of  ail  meals  prepared  and  all  food  consumed  by 
household  members  and  guests.  The  sampling  methodology  was 
similar  to  that  utilized  in  the  Haiti  survey,  except  that 
stratification   was  implicit   rather   than  explicit. 

Tanzania  had  detailed  published  statistics  by  census  enu- 
meration area  and  had  district  maps  and  individual  sketches  show- 
ing the  location  of  each  EA.  Because  of  topographical  influences 
on  income  and  consumption  patters,  the  sample  area  was  divided 
into  12  contiguous  vertical  slices/sectors  of  the  mountain,  each 
constructed  so  as  to  contain  essentially  the  same  number  of 
households.  The  EAs  were  listed  in  serpentine  fashion  beginning 
at  the  top  of  the  first  EA ,  proceeding  to  the  bottom,  and  onto 
the  bottom  of  the  next  one,  proceeding  to  the  top.  A  systematic 
sample  of  36  EAs  was  drawn  using  1/36  of  the  cumulative  household 
population  listing  as  the  systematic  sampling  interval.  House- 
holds    were   listed   within   each   of   the  selected  EAs  and  eight  were 

-8- 


Bubsaffip  1  ed   in   similar  ta^hDon. 


The  process  of  selecting  househoi  ds  wi th  pr  oD abi 1 i  ty  propor- 
tional t  o  a  (Ti  e  a  s  u.  r  e  o  f  s  i  z  e  a  t  t  n  b  t  i  r  b  t  s  t  a  q  e  ^  a  s  is  a  c  c.  o  m  p  1 1  s  hi  e  d 
with  the  s  y  s  t  e  m  a  1:  i  c  s  a  fTi  p  1  i  n  q  |:j  r  a  <:z  e  d  u  r  e  ,  c  o  li  p  I.  e  d  w  ;i.  t  |-i  a  c:  o  n  s  t  a  n  t. 
size  sample  per  EA  at  the  second  stage  created  an  essentially 
sel t -wei ght i nq  sample.  This  was  necessary  since  preliminary 
tabulations  were  manual.  No  adjustment  was  made  for  the  differ- 
e^nce  between  the  measure  of  size  used  to  draw  the  sample  and  the 
number  of  households  actually  listed  by  enumerators.  In  some 
cases  these  differences  were  substantial „  However,  it  frequently 
was  not  clear  that  such  differences  were  due  to  changes  in  the 
household  populations  of  the  sampled  EAs  as  opposed  to  failing  to 
identify  the   actual    boundaries  utilized   in   the  census. 

Conceptually  the  selected  first-stage  unit  was  only  an  inter- 
val on  a  continuum  and  it  made  little  practical  difference  that  we 
may  have  not  identified  the  proper  number  of  households  at  the 
time       of        listing     the     households     in     the     field.  Only  if 

out-migration  were  substantial  and  varied  at  different  localities 
on  the  mountain  would  the  differences  in  sampled  and  listed 
populations  represent  a  problem  of  practical  significance.  More- 
over, even  if  such  were  the  case,  the  relatively  small  sample 
size  (S*36=2SS)  meant  that  sampling  errors-  would  have  swamped  the 
effect  of   differential  out-migration. 

The  sel  f  — wei  ght  i  ng  nature  of  the  sample  allowed  use  of  simple- 
random  sampling  estimators  without  introdu'  inq  bias.  f^or  calcu- 
lating variances  the  study  used  both  a  simple  random  sample  esti- 
mator and  a  modified  two-stage  estimator  given  by  Yates  (1960, 
p. 197)  that,  though  slightly  biased,  is  computationally  simple. 
The  two-stage  estimator  was  used  only  on  a  few  key  variables  to 
determine  the  average  difference  between  the  two  methods.  Vari- 
ances calculated  via  the  simple  random  sampling  formula  were  then 
roughly  corrected  for  the  average  difference  between  the  two- 
This  approach  is  adequate  in  rural  areas  where  sampling  errors  are 
usually   insignificant   as  compared   to  non-sampling  errors. 


Iv.    LESSONS  FOR  DESIGNING  COST-EFFECTIVE  SAMPLING  METHODOLOGIES 


I  n  e?  X  a  m  i  n  i  n  g  t.  h  e  (i  i  e  r  i.  1 5  o  f  b  a.  m  p  1.  i  n  q  met  hi  o  d  o  I  o  g  :i  e  b  -f  •  r  r.:<  m  t  h  e 
per  Epect  i  ve  of  coBt. -e  f  t  ect  i  veneES. :!  t  ie  important  to  consider  the 
pot  en  t  i  a  1  f  or  b  ot  h  samp  1  :i  n  cj  an  d  r  <  on  --  samp  ling  er  r  or  s  i  n  c  on  sump  t  i  on 
surveys.  This  is  especially  true  in  rural  areas  where  household 
production  is  an  important  part  of  consumption  and  expenditures. 
It  is  equally  true  where  the  local  agencies  responsible  for  imple- 
mentation lack  manpower,  material  resources,  work  discipline  or 
adequate  supervision. 

Sampling  e-?rrDrs  arise  both  from  faulty  sample  design  and  from 
the  fact  that  not  ail  members  of  the  population  are  interviewed. 
Non-sampling  errors  arise  from  such  factors  as  poorly  trained  enu- 
merators, poorly  designed  and  tested  survey  instruments  and  survey 
operations,  untimely  field  support  to  correct  problems  in  execu- 
tion, lack  of  supervision  to  ensure  enumerators  are  performing  to 
standard,  imprecise  measures  or  recall  periods,  poor  translation, 
respondent  deception  or  lack  of  knowledge  and  a  host  of  other  fac- 
tors. In  the  typical  LDC  context  errors  arising  from  these  kinds 
of  factors  are  usual  1 y  quant i tat i vel y  much  more  significant  than 
sampling  errors.  They  also  increase  with  sample  size.  From  the 
point  of  view  of  accuracy  of  estimates,  survey  designers  need  to 
balance  resources  and  time  spent  constructing,  utilizing  and  mak- 
ing inferences  from  an  unbiased  sampling  frame  with  resources  and 
time  devoted  to  field  data  collection.  The  former  minimizes  re- 
quired sample  size  for  a  given  level  of  precision  V'jhile  the  latter 
maximizes  data   '-uality   and   improvers  Accur  s.cy .  ^ 


^  Precision  refers  mainly  to  sampling  error,  i.e.  the  extent  to 
which  a  sample  approximates  the  under Iving  data  from  which  it  is 
drawn.  Accuracy  refers  to  precision  plus  bias.  Many  very  useful 
estimators  are  slightly  biased.  They  are  used  where  the  bias  is 
small  because  they  are  simple  to  employ  or  are  practically  measur- 
able, bias  includes  mostl  .'  the  effect  of  non-sampling  isrror. 
Non-sampling  error  is  largely  an  unknown  element,  though  we  can 
estimate  certain   aspects  of  it. 


There  is  also  a  rel  at  i  onBhi  p  between  thie^  iength  of  the  quE-B™ 
t  i  o  n  n  a  i  r  e  a  n  d  d  t  a  q  u  a  1  i  t.  y  t  \-  \  a  t  i  n  t  e  r"  a  c::  t.  b  w  1 1 1.  s  a  m  p  1  e  £  :t. e .,  F  o  i'"  a 
g  i  V  e  i  "I  b  n  d  q  e  t  a  J.  a  r  q  e  f  q  u  e  b  t  i  o  n  r  i  a  i  r  e  w  ill  permit  a  b  m  a  1  1  e  r  s  a  iti  p  1  e 
!=.  1  z  e .  T  h  e  a  p  p  r  o  pv  r  i  a  t  e  t  r  ^  a  d  ~  a  t  f  w  :i.  11  d  e  pf  e  n  d  o  i  "i  b  (  j.  r  v  e  y  o  b  .i  e  c  t.  i  v  e  s 
and  reB-earcher  percept iori  of  the  gains  or  losseB  in  quality  as 
the*Be  relate  to  both  the  size  and  the  1  ayoi-it  oi  the  survey  instru- 
ment  and  the  frequency  of  visits  to  survey  households.  These  are 
as  much  problem  identification,  data  collection  and  questionnaire 
design  issues  as  they  are  sampling  design  issues.  Sampling  meth- 
odology will,  in  most  cases,  be  dictated  by  how  researchers  per- 
ceive these  other  issues. 

In  general  terms,  a  sel  f -wei  ght  i  ng  samp:<le  provides  greater 
flexibility  with  respect  to  rapid  tabulation  of  results  under  the 
widest  set  of  circumstances-  Sel f ~wei ght i nq  samples  reduce  to 
simple  random  samples  for  calculating  totals  and  means.  This  is 
very  helpful  where  manual  tabulation  is  necessary  or  computer  pro- 
graming skills  are  limited.  As  was  done  with  the  Kilimanjaro 
survey,  variances  can  be  approximated  by  first  calculating  a  dozen 
or  so  variances  using  both  the  simple  random  sample  formula  and  a 
slightly  biased  but  easy  to  calculate  two-stage  formula-  The  re- 
maining variances  can  be  calculated  using  the  simple  random  sample 
formula,  correcting  for  the  proportional  difference  between  the 
two  methods  as  suggested  by  the  results  of  those  actually  calcu- 
lated using  both  methods-  Of  course,  where  facilities  and  re- 
sources permit,  the  same  sampling  methodology  can  accommodate 
variable  weighting   for   a  more  accurate,    less  biased  result- 

The  Haiti  and  Kilimanjaro  surveys  provide  examples  of  various 
aspects  of  a  suitable  rapid  appraisal  sampling  methodology-  Both 
surveys  employed  listing  and  stratification  techniques  of  one  sort 
or  another  to  improve  sample  efficiency-  Both  surveys  sampled  with 
probability  proportional  to  size;  and  both  collected  data  from 
25-35  first— stage  units  and  utilized  a  constant  sized  sample  m 
the  selected  cluster  areas.  These  procedures  generated  samples 
that  were  largely  sel f -wei ght i ng  within  a  stratum  and  included  a 
sufficient  number   of   psus  to  provide  reasonably  precise  est i mates - 

-11" 


ihesE'  are  the  eE^Eential    i  nqred  i.  ent  s  for   a   tle^iible.      e-f  ficient  and 
6;  a  B 1  1  y  prep  a  r"  e  d  ,    e ; ;  e  c  u  t  e  d     a  n  c:l   t  a  b  l(  1  ate  (d   b  a  .ti  p  i  e   b  u  r  v  e  y  . 

The  Haiti  Eurvey  provided  a  novel  niethodoi  og  i  rial  variant  that 
hae  wide  app  1  i  c  ab  i  i  i  ty  in  politically  unstable  en  v  i  r  onment  b  .  L)i - 
VI  ding  the  sampled  EAs  into  13  randomly  selected  subsamples  meant 
that  each  subsample  provided  an  independent  estimate  o-f  national 
consumption,  albeit  one  subject  to  considerable  seasonal  bias. 
This  proved  valuable  when  political  unrest  forced  cessation  of 
data  collection  after  collecting  only  eleven  months  of  data.  In 
spite  of  collecting  less  data  than  planned,  the  survey  was  com- 
plete for  the  nation.  Had  survey  planners  chosen  to  survey  in 
blocks  requiring  the  completion  of  all  13  four-week  periods  to  ob- 
tain a  complete  sample,  the  usefulness  of  data  which  were  actually 
able  to  be  collected  would  have  compromised. 

The  decision  to  interview  each  household  only  once  and  to  use 
separate  households  for  measuring  consumption  and  expenditures  at 
different  times  of  the  vear  is-  a  not her  interesting  methodological 
variant  of  the  Haiti  survey.  The  researchers  argue  convincingly 
that  this  approach  is  more  rapid  and  cost-effective  where  analyti- 
cal needs  can  be  satisfied  with  data  for  typical  groups  of 
households  rather  than  for  individual  households-  Accurately  mea- 
suring nutrient  intake,  in  particular,  requires  many  more  visits 
per  household  than  those  commonly  recommended  for  repeated  house- 
hold visits  in  surveys  in  developing  countries  <Freudenheim  et 
al_.  ,  1937).  By  settling  for  data  for  population  groups  the  total 
number  of  interviews  required,  and  thus  the  cost  of  the  survey,  is 
significantly  reduced - 

In  the  typical  rapid  appraisal  context  researchers  will  have 
to  rely  on  administrative  lists  of  population  or  dated  census  in- 
formation for  a  sampling  frame.  All  of  the  methodologies  reviewed 
above  utilized  one  of  the  two.  Administrative  lists  of  house- 
holds, used  in  Madagascar  and  Liberia,  are  generally  more  avail- 
able and  more  clearly  defined.  The  smallest  subdivisions  for 
which     population  data  can  be  found   in   •'■he  capital    city  can  serve 


as   firEt -stage   E-ample  units,    liaps  of   c  en  bub  enuine^r  at  i  on   ac'easH  es- 
pecially    in   rural    areas   are   frequently  nonexistent   or      lack  suf- 
ficient  detail    to  permit   precise  re-i dent i t i cati on   on   the  ground. 
But     the     census     enumeration   areas  have  the     advantage     of  being 
Bi'Tial  1  ,      of   similar   size  and   well    distributed   across  ecological  and 
agricultural      variation  —  two  -factors  having   a  profound   impact  on 
consumption  patterns.      Ideally,    a  rapid  appraisal    sampling  method- 
ology    should   be  robust   enough   to  adapt   itself   readily     to  either 
type  of   sampling  frame- 
Administrative  and  census  population  data  usually  give  a  rea- 
sonably    good  measure  of   the  actual    size  of   the  administrative  or 
census     enumeration     areas  to  be  included   in  the     sampling  frame. 
Since  some  households  may  have  moved   or   doubled   up   since  the  last 
data  were  collected,    it   is  always   important   to  verify  the  sampling 
methodology's     &>i_     ante  estimate  of   the  size  of   the     sample  units 
with  what  enumerators  find   at   the  time  of   actually  listing  house- 
holds.       lAJeights  can  then  be  assigned  to  the  individual  households 
surveyed   to  correct   for    large  discrepancies. 

In  a  relatively  stable  context,  such  as  existed  in 
Kilimanjaro,  or  where  the  measure  of  size  (previous  census  or  ad- 
ministrative statistics)  is  recent,  one  could  sometimes  ignore 
such  adjustments  without  affecting  the  results  appreciably.  A 
quick  comparison  of  the  listed  population  with  the  census  figures 
will  reveal  whether  such  an  adjustment  is  necessary.  Much  will 
depend   on   how  the  data   are  to  be  tabulated.  Unless  done  by  com- 

puter, tabulating  data  from  weighted  households  can  be  very  te- 
dious indeed.  Ignoring  minor  differences  between  the  measure  of 
size  used  in  sampling  and  the  actual  size  found  at  listing  allows 
each  observation  to  have  an  equal  weight  within  the  sub-population 
in  question.  As  a  consequence,  except  for  computing  variances, 
the  data  can  be  treated  as  a  simple  random  sample- 
But  where  the  differences  are  significant,  as  they  frequently 
are  in  urban  areas,  and  where  those  differences  appear  to  arise 
from     populat-ion     dynamics  rather  than   identification     or  listing 


pr  obi  emE- ,  Borne  adjustment  nE?G?ds  to  be  made.  Other  wise,  the  Bur-- 
V  e  y  m  e  a  sh.  u  r  e  s  i:  i"i  a  h  w  h  i  <■:  h  e  >;  :i.  i=- 1  ci  a  iv.  1 1*  e  time  t  hi  e  }:i  o  p  i.i.  1  a  1 3  <  j  n  <d  a  t  a 
were  collected  rather  than  what  exists  at  the  time  of  the  consump- 
tion survey.  Significant  migration  of  young  heads  of  household 
seeking  urban  employment  or  scuttling  new  lands  are  common  situa- 
tions in  Africa  that  give  rise  to  a  need  for  adjusting  sampling 
weights  m   light   of   what  enumerators  find   at   the  time  of  listing. 

Turning  to  the  guest ion  of  overall  sample  size,  what  is  ap- 
propriate depends  upon  v^ihat  policy  makers  intend  to  do  with  the 
data,  the  amount  of  variability  in  the  population  with  respect  to 
the  kind  of  data  being  collected,  sampling  methodology  and  re- 
source and  operational  constraints-  iiegill  and  L>auphin  (19S6>  of 
the  US  Eiureau  of  the  Census  indicate  that  exp^erience  with  consump- 
tion surveys  in  LE)Cs  suggests  that  in  countries  with  population 
distributions  similar  to  Haiti,  3000  households  should  be 
sufficient  to  provi de  r el i ab 1 e  urban  and  rural  estimates  at  the 
national  level  as  well  as  estimates  of  predominant  characteristics 
at  the  regional  level.  The  study  in  Liberia  got  good  results  for 
urban  areas  with  942  households.  E<ut  one  would  not  expect  to  ob- 
tain the  same  results  in  Zaire  or  in  some  Asian  countries  where 
consumption  patterns  vary  significantly  between  regions.  Where  a 
previous  consumption  survey  exists  variance  in  that  sample  can  be 
used  to  more  precisely  determine  an  appropriate  sample  size  for 
the  current  survey. 

In  most  surveys  it  is  desirable  for  operational  reasons  to 
use  multistage  sampling.  This  greatly  reduces  the  cost  of  identi- 
fying a  sample  and  interviewing  the  selected  households-  Survey 
work  is  more  concentrated  and  quality  can  be  more  easily 
controlled.  As  a  result,  non-sampling  errors  are  usually  consid- 
erably smaller. 

On  the  other  hand,  multistage  sampling  introduces  an  element 
of  similarity  between  sample  units  that  is  usually  greater  than  in 
the  population  at  large.  This  arises  from  people's  tendency  to 
live  near   people  more  like  themselves-      The  higher   the  correlation 

-14- 


between   B^-(mple  unitsr:.  w-nthin   a   cinEiter,      i;.he  more  prE-c;  i  si  on   :i  ?    J.  Dst 
by  concentrating   a   given   samp>le   in   fewer   cl.  iisterE. 

Megill    and   Dauphin    (IVSS?    i  radicate  that   studieB  of    intracla  =  B 
correlation     done  tor   consumption   BurveyB   in   general    Buggest  that 
the     range  of   optimum  values  tor   the  number   of      sample  households 
per   cluster   normally   includes   10.      This  was  the  number   used   i  ?~(  the 
Haiti      survey,.        The     Kilimanjaro  survey  used     eight     per  cluster 
(Zaila,      1982).        There,    eight   households  in   36  tirst-stage  sample 
units     in   a  relatively  homogeneous  area  produced     coe-f  t  i  c  i  ent  s  of 
variation     for   estimated  means  tor   universally  distributed  nutri- 
tion  variables  that   averaged   2.5/1  when   the  data  were  treated   as  a 
simple     random     sample     and     2.8%     when      treated     correctly     as  a 
two-stage     sample.        With   commodity  by  commodity  consumption  data 
the     standard     errors     would     be     considerably     larger.        And  the 
relative  differences  between   the  simple  random  and  two-stage  esti- 
mates would  be  larger   in   less  homogeneous  areas.        But  these  data 
support     the  general    conclusion  that   sampling  8-10  households  per 
cluster     provides     a     good  balance     between     providing  sufficient 
first-stage  sample  units  to  generate  reasonably  precise  estimates 
on  the  one  hand,      and  concentrating   data  collection  efforts  in  or- 
der to  control    data   quality  and  minimize  costs,    on  the  other. 

Taking  the  mean  values  of  36  first-stage  units  to  calculate  a 
variance  for  a  two-stage  estimate  is  not  the  same  as  using  only  36 
observations.  The  first-stage  means  already  average  out  a  great 
deal  of  variation  in  the  data.  Only  where  each  cluster  of  sample 
units  was  very  homogeneous  within  a  cluster  and  very  heterogeneous 
between  clusters  would  the  intraclass  correlation  be  so  high  as  to 
greatly  reduce  the  precision  obtained  from  a  multistage  sample 
versus  a  simple  random  sample  of  the  same  number  of  elements  as 
psus  in  the  multistage  sample.  Even  in  these  cases  the  population 
can  often  be  divided  into  strata  so  as  to  maximize  variation  be- 
tween the  strata  and  minimize  variation  between  first— stage  sample 
units.  Such  stratification  sometimes  more  than  offsets  the  effect 
of  clustering  within  a  stratum.  Stratification  is  frequently 
quite     feasible   in   urban   areas  where  housing   patterns  are     a  good 

-15- 


i;::'  r  o  x  y    f  o  r   cons  i.i  rn  p  t  i  C3  n   p  a  1 1.  e  r-  n  . 

What,  this  all  /neans  is  that,  there  ib  not  u'iHi.t.ally  a  great  con- 
•flj.  ct  between  the  need  to  identify,  intervi.  e?w  and  tabulate  sample 
elements  rapidly,  and  the  need  to  have  results  that  reasonably  ac- 
curately reflect  -food  consumption  patterns  within  the  sample  area. 
The  major  additional  cost  of  a  more  rigorous  and  statistically- 
sound  sampling  methodology  is  the  time  and  money  required  to  list 
all  sample  units  within  a  greater  number  o-f  sample  clusters  and  to 
make  whatever  social  and  political  introductions  are  necessary  to 
enlist  the  cooperativon  ot  sample  households  within  each  o-f  the  se- 
lected clusters.  Transportation  time  and  costs  will  also  be 
greater  but  they  need  not  slow  the  collection  o-f  data  appreciably 
over  a  concentrated  sample  population-  Prepjaration  of  the  sample 
frame  and  actual  collection,  tabulation  and  analysis  of  the  data 
are  not  affected  in  a  significant  way.  This  is  where  the  greatest 
portion  of   time  and  money  is  normally  spent. 


V.      THE  NEG  RAPID  APPRAISAL  WORKSHOP 


The  NEG  April,  19B7  workshop  on  Rapid  Appraisal  Techniques 
For  Food  Policy  Analysis  included  an  entire  session  on  sampling 
issues.  The  discussion  centered  around  the  importance  of  minimiz- 
ing non-sampling  errors  in  data  collection;  the  one-visit,  stag- 
gered approach  to  data  collection  used  in  the  Haiti  survey;  and 
the  tendency  for  secondary  users  of  survey  data  to  analyze  the 
data  as  though  it  were  collected  using  a  simple  random  sample 
methodology.  Most     of     the     issues     relating     to     benefits  of 

stratification  and  multistage  sampling  mentioned  above  were  also 
d  i  scussed . 

As     with  many  of   the  sessions,      several    of   these  issues  were 
raised     without  moving  to  a  resolution  or   a  position  being  taken. 
This  was  to  be  the  purpose  of   the  afternoon  working-group  discus- 
sions.       These,      in  turn,      were  too  short   for  what   had  to  be  done. 
As     a     result,      the     afternoon  working     sessions     did     not  really 


-16- 


accomplish  much  more  than  greater  depth  of  ditBCUBBion  of  1 5  sue  5 
r-c"AiBed  in  the  plenary  session-  F-or  this  reason  one  cannot  really 
?:-ay  that  the  sampling  methodology  for  rapid  collection  of  con- 
sumption and  expenditure  data  recommended  in  this  paper  reflects  a 
consensus  of  opinion  among  the  data  collection  methodology  discus- 
sion group.  At  the  same  time,  however,  little  in  the  methodology 
conflicts  with   what   was  said,  either. 

The  one  point   about   which   there  was  some  disagreement   was  the 
desirability  of   sampling   so  as  to  obtain  a  sel f -wei ght i ng  sample. 
One  of   the  participants  felt   this  was  too  restrictive  and  unneces- 
sary.       It  usually  requires  cutting  corners  that,      in   some  circum- 
stances may  not   be  wise.        This  certainly   is   a  valid   concern  where 
the     data     on  which   the  sampling   frame  is  based   are  out     of  date. 
For   this  reason  the  recommended   sampling  methodology  facilitates  a 
sel f -wei ghti ng  sample,   but  does  not  require  it- 
One     of   the  more  interesting   ideas  discussed   at   the  workshop 
was     the  methodology  being  used   in  Haiti    to  gather   data.        As  was 
mentioned  previously,      the  selected  first-stage  sample  units  were 
divided     into   13  systematically  selected  subsamples.        One  of  the 
subsamples  was  interviewed  every  four   weeks.      Each   selected  house- 
hold  was  thus   interviewed  only  once. 

Though  workshop  participants  did  not  reach  any  obvious  con- 
clusion regarding  the  merits  or  disadvantages  of  the  single  inter- 
view versus  the  multiple  interview  approach,  many  of  the  points 
raised  in  the  keynote  paper  by  the  author  (Zalla,  1987)  were  men- 
tioned. These  included  the  ease  of  revisiting  a  previously  inter- 
viewed household  versus  the  probability  of  frequent  absence  or 
non-response  of  a  sampled  household  on  subsequent  interviews 
because  of  the  unusual  mobility  of  Haitians.  There  was  some 
skepticism  among  participants  that  multiple  interviews  of  the  same 
household  increase  cooperation  and  data  quality  as  much  as  i s  fre- 
quently assumed.  Nor,  in  Haiti,  does  it  seem  necessary  to  spend  a 
great  deal  of  time  informing  and  gaining  approval  of  respondents 
for  conducting   the  interview.      The  major   cost  there   is  the  cost  of 


o  n  d  Li  c  t:  i  r  i  q  a  ri  a  c,  t  u  a  1  ho\'  e  e  h  o  Id  in  t  e  i'"  v  i  t:-'  w .  J::>  i  n  (z  e  t  h  e  w  :i  t  h  i  n  h  o  u  5  e  - 
h old  c  o r  r  e  1  a  t  :i  o n  b e t.  w e b n  c: o n & l\ m p  t.  i  o n  o r  e >j  p e n d  1 1 r  e '.5  at  d  i  t  e r  e r  1 1 
perifjdE  of  t.irriG'  is  probably  quite  high,  many  fewer  actual  house" 
hold  interviews  are  required  to  obtain  a  given  level  of  precision 
with  the  single  interview  approach „  All  of  this  asBumes,  of 
coarse,  that  the  analytical  c?bjectives  do  not  require  data  on  in- 
dividual households.  This  is  a  common  situation  for  expenditure 
surveys  but    is   less  true  for   consumption  surveys. 

In  a  context  where  data  are  collected  on  an  individual  house- 
hold more  than  once,  the  sampling  methodology  effectively  includes 
another  stage.  The  last  stage  is  household  consumption  or  e;;pen- 
ditures  over  time.  More  than  one  observation  gives  a  better  mea- 
sure, caeter  i  s  par  i  bus ,  than  does  a  single  observation,.  In  ef- 
fect, sample  size  at  this  last  stage  is  equal  to  the  number  of 
households  surveyed  times  the  number  of  interviews  per  household. 
A  single  interview  per  household  for  a  sample  of  given  size  sub- 
stantially reduces  effective  sample  size-  Though  it  would  depend 
on  just  how  stable  consumption  and  expenditure  patterns  within  a 
given  household  are  over  time,  a  single  interview  sample  would 
need  to  contain  a  larger  number  of  households  than  a  multiple  in- 
terview sample  to  obtain  an  equal  level  of  precision.  It  would, 
however,    generally  require   far   fewer   total  interviews. 

The  increase  in  sample  size  required  to  obtain  an  equal  level 
of  precision  depends  on  the  magnitude  of  the  intraclass  (i.e. 
i ntr ahousehol d >  correlation  between  consumption  or  expenditures  in 
one  period  and  that  in  another.  This  intraclass  correlation  will 
probably  be  higher  in  urban  areas  than  in  rural  area=", .  It  will 
probably  be  higher  in  very  wealthy  and  very  poo*^  households  as 
well.  It  will  also  probably  be  higher  tor  expenditures  than  for 
consumption.  The  higher  this  correlation,  the  greater  will  be  the 
gains  in  precision  frcn  a  single  interview  to  a  larger  sample. 
These  are  f^.casurab  1  e  phenomena;  and  this  is  an  important  issue  for 
NEG.  It  Dears  directly  on  rapid  appraisal  data  collection  method- 
ology where   individual    household   level    data   is  not  required. 


-IS- 


B  e  fore  o  n  e  can  ni  a  i e  a  (d  e  f  i  n  1. 1.  ;i.  v  e  p  i-  <.:<  n  r.::'  u  ri  <::  e  ni  e  rt  t  c  o  r  i  c:  e  r'  n  i  n  q  h  h  e 
p  o  t.  e  n  h  1  a  i  b  a  v  i  n  g  b  i  n  c  t;*  b  t  a  f  i  ci  t  :i  m  e  (□  t  i.:.  |- 1  e  sin  g  1  e  •  - 1  n  t  e  r-  v  i  f-.:-  w  , 
1 1  me -phaBed  data  coll  ectjon  technique?  used  in  Haiti,  NEG  will  need 
to  finance  a  comb  i  nat  i  on  conBumpt  i  on  /  e;;  pendi  tu.re  Bur  vey  that  in- 
cludes an  examination  o-f  both  the  design  e^f  feet  and  the  effect  on 
data  quality  arising  from  interview  frequency.  This  is  an  1 5>Bue 
quite  distinct  -from  the  sampling  methodology  suggested  in  this  re- 
port. It  concerns  primarily  optimal  sample  si::'.e  for  the  two  ap- 
proaches to  dc-fita  collection™  The  sampling  methodology  itself  can 
accommodate  virtually  any  approach   to  data  col  lection - 


VI.    RECOMMENDED  SAMPLING  METHODOLOGY  FOR  NEG  SURVEYS 

Eiecause  of  the  wide  variation  in  circumstances  under  which 
this,  sampling  methodology  will  be  ap:'plie?d,  it  is  desirable  that  it 
be  statistically  rigorous  while  admitting  shortcuts-  that  have  a 
more  or  less  determinable  effect  on  the  precision  and  accuracy  of 
the  estimates.  A  good  way  to  accomplish  this  is  with  a  multistage 
stratified  sample  that  is  largely  sel f  -wei ght i ng »  Multistage  sam- 
pling reduces  the  cost  of  data  collection  but  creates  a  design  ef- 
fect that  reduces  the  precision  of  estimates  vi  s  a  v i  s  Eiimpie 
random  samFtling.  Appropriate  stratification,  on  the  other  hand, 
improves  efficiency  over  simple  random  sampling  and  helps  offset 
the  design  effect   of   multistage  sampling. 

Samp  1 1  nq   Frame  And   S_amgJ_  i_r^^^  F'r  ocedur  £7S 

A  list  of  census  enumeration  areas  or  of  the  lowest  level  ad- 
ministrative units  and  their  respective  sample  unit  populations 
makes  a  readily  accessible  and  easy  to  use  -first-stage  sampling 
frame  for  consumption  surveys  in  most  countries.  These 
first-stage  units  can  usually  be  grouped  into  strata  and  substrata 
of  known  population  and  economic  or  geographical  characteristics- 
They  are  usually  small  enough  that  one  enumerator  can  cover  the 
area  on  foot. 


The     cnost    frpqut/nt    criteria   uBE?d   to  str  .-.rtt  ;l  (  y   e  ample  u.niiB  in 
comir-umpt  1  on      EurvevEi.   ar<e  qeoqr  apl  1 1  c (urban   ver  Fus     rur  al)^  eco- 
ioqical       (aEi-     these  relate  to  ffU:<..ior      differences     in      aqr  i  cliI  tur  ai 
p r  a d u c  t  i  o ri   pat  t  e r  n s )    a n d   e c. o n o m i  <:::    <  i.  a r  q e  differ  e n c e e   i  n   :i.  n r.:. o n'l e  as 
i  n  d  1  c  a  t  e  d   b  y  h  o  u  s  i  i"i  g   p  a  1 1  e  i-  r  i  s  ,      1  o  c.  a  1 1  c:'  i-t ,      e  t.  h  n  i  c   g  r  o  u  p   o  r      r  a  c:  e )  . 
Strata  and   substrata   are     defined   according   to  an   hierarchy  o-f  im- 
portance  V  i  s  a   V  i  £  analytical    objectives  and   the  most   logical  sam- 
pling  procedure.         Ideally,      the  substrata   should   be  defined  such 
that     proportional      allocation   of   the  sample  to  the     various  sub- 
strata within   a  strata  will    provide  a  reasonably  efficient  sample 
design.        This  can   limit   necessary  weighting  to  the  stratum  means., 
reducing       substantially     the     aggregation     burden     for  manually 
tabulated   surveysu        Moreover,      the  strata  should  be  defined  so  as 
to     ma>;imize     the  variation  between  the  strata     and     minimize  the 
variation     within   a  stratum.        This-  improves  the  precision  of  the 
estimates  over   those  from  a  simple  random  sample.      All   that   is  re- 
quired    is  that   the  sub-populations  be  known   and  clearly  defined. 
Each  administrative  subdivision  or   EA  is  assigned  to  the  substra- 
tum which   appears  to  contain  most   of   its  population. 


In  some  cases,  no  i nf or mat i on  wi 1 1  be  available  at  a  central 
level  on  the  number  or  size  of  the  lowest  level  administrative 
units.  In  these  cases  it  will  be  necessary  to  increase  the  number 
of  stages  of  sampling,  utilizing  first  only  that  information  which 
is  centrally  available,  and  then  moving  into  those  subdivisions 
selected  at  the  first  stage  to  gather  the  additional  information 
needed  for  sampling  at  subsequent  stages.  In  this  way  researchers 
avoid  having  to  gather  information  for  non-selected  regions.  This 
process  continues  until  the  ultimate  clusters  containing  the  indi- 
vidual sample  elements  are  identified.  It  is  important  to  keep  in 
mind,  however,  that  the  most  important  single  influence  on  the 
precision  of  estimates  obtained  from  a  multistage  sample  is  the 
number  of  cluster  units  included  in  the  sample,  not  overall  sample 
size.  For  this  reason  the  sampling  frame  should  be  designed  so 
that  a  minimum  sample  of  25-35  first-stage  units  within  each  stra- 
tum is  feasible,  and  each  substratum  has  at  least  two  first-stage 
sample  units. 


To     select      the   fir'^it  Sntaqe   Evarnp  I.  e  units     to     be  interviewed 

p r  oc eed   i  n   the  t  o  1  1  (:3 w :i.  n □   man ner  : 

1)  Prepare  a  1  i'rit  of  ail  t  i.  r  st -et  age  sample  unitB  in  the  sub- 
E-tr  atum.  In  many  cases  i  t  will  |3rove  useful  to  list  these  psus  in 
serpentine  fashion  or  m  some  other  order  that  provides  implicit 
stratification   of    the  psus. 

2)  On     the   list   note  the  sample  population  of     each     of  the 

psus. 

3)  Construct  a  cumulative  population  listing,  beginning  with 
the  sample  population  of  the  first  psu  on  the  list  and  adding  to 
it  the  sample  population  of  each  subsequent  psu.  In  this  way  the 
sample  population  of  each  psu  occupies  a  unique  and  determinable 
interval    within   the  cumulative   list   for   the  stratum. 

4)  When  the  listing  is  complete,  divide  the  total  number  of 
sample  elements  in  the  stratum  by  the  number  of  first --stage  sample 
units  desired  in  each  stratum,  preferably  25-30  per  stratum.  This 
yields  the  systematic   sampling   interval    for   selecting   sample  psus. 

5)  Then  draw  a  random  number  between  zero  and  the  sampling 
interval.  The  first-stage  sample  unit  containing  that  household 
number   in   its  cumulative  total    becomes  the  first   selected  psu. 

6)  Proceed  to  select  the  remaining  first-stage  units  by  add- 
ing the  systematic  sampling  interval  to  the  first  number  and  se- 
lecting each  psu  containing  the  designated  sample  element  in  its 
cumulative  total.  If  one  of  the  first-stage  sample  units  is 
larger  than  the  systematic  sampling  interval  it  will  be  selected 
more  than  once.  In  that  case  it  should  be  allocated  one 
second-stage  sample  unit  for  each  time  it  is  chosen.  This  process 
continues  until  the  sample  units  at  all  stages  except  the  last 
have  been  identified. 

Once  researchers  have  identified  the  ultimate  clusters  to  be 
interviewed,  enumerators  must  prepare  an  e>;hauBtive  list  of  sample 
units  living  in  the  selected  clusters.  If  census  enumeration  ar- 
eas are  the  first-stage  sample  units  this  requires  that  their 
boundaries  first  be  identified  on  the  ground.  Enumerators  will 
then  pass  from  house  to  house  or  from  local  leader  to  local  leader 
to  obtain   the  names  and   addresses  of   households   living   within  the 


border-.  Ur7::.u..isl  J  y  honsinq  unit's  ratht-c  tlian  I  icj^-pI  lol  ds  ere  lit-ted, 
a  1 1  d  i  n  f  c  r"  <ri  a  1 1  o  n  i  e  o  b  t  a  3.  ri  t?  d  t  (':■  ci  :i.  Hi  1 3  r-i  q  u  1  s  r  1  p  e  r ■  fri  a  n  e  rt  t  !L  y  d  c  c  u  p  i  e  d 
h  o  L.<  B  3.  n  q  i.A  n  its  ( e  1  i  q  1  b  1  e-  u  n  its)  fro  m  v  a  c  a  n  t  ,  e  e  a  b  o  n  a  1  and  other  1  n  - 
eiigible  unitG.  Adequate-^  operational  control  is  required  to  avoid 
d  u  p  1  i  c;  a  t  i  <3  n  arid  o  (n  i  b  i5 1  o  r  •  a  n  d  t  o  e  n  5  u  r'  e  t  h  a  t  t  h  e  1  n  t  o  r  <  n  a  1 1  a  n  o  b  - 
tained  tor  each  unit  will  be  sufficient  for  the  interviewer  to  lo- 
cate the  houEinq   unit   later    if    it   is  selected. 

Frequently  there  will  be  a  list  of  all  household  heads  in  an 
administrative  unit  at  the  headquarters  of  the  unit.  This  is  one 
advantage  of  using  administrative  units  as  a  sampling  frame.  The 
boundaries  are  clear.  Boundaries  for  census  enumeration  areas,  on 
the  other  hand,  sometimes  cross  political  boundaries.  Administra- 
tive  lists,    however,    may  not   be  complete  or   may   be  out   of  date. 

The  selection  of  the  sample  elements  to  be  interviewed  pro- 
ceeds in  the  same  fashion  as  at  previous  stages,  except  that  the 
individual  housing  unit  names  are  numbered  and  cumulated.  The 
systematic  sampling  interval  is  determined  by  dividing  the  total 
number  of  housing  units  in  the  cluster  by  the  size  of  sample  to  be 
chosen  in  the  cluster,  preferably  8-10  households  per  cluster. 
Those  housing  units  with  the  serial  numbers  falling  on  the  system- 
atic sampling  interval  are  then  the  ones  to  be  interviewed  for  the 
survey . 

In  order  to  anticipate  the  likelihood  of  needing  to  replace 
certain  elements  in  the  sample  it  is  desirable  to  draw  more  sample 
elements  than  one  anticipates  actually  interviewing.  This  can 
most  effectively  be  accomplished  by  increasing  the  initial  desired 
sample  size  per  cluster  by  30-50/1  and  then  randomly  drawing  a 
subsample  of  the  desired  size.  Alternatively,  one  can  add  or  sub- 
tract one-half  of  the  systematic  sampling  interval  and  take  those 
households  that  fall  on  multiples  of  two  times  the  interval  as  re- 
placement households,  selecting  one  at  random  as  needed.  Still, 
every  effort  should  be  made  to  avoid  substitution  and  to  retain 
the  original    sample  as  much   as  possible- 


In  moE-t.  real  world  E-i  tuat  i  ons  per  t  c.a1.  ni  ny  to  conEuoip  1 1  on  Bur- 
V  e  y  B  w  e  h  a  v  e  o  n  i  v  a  f n  e  a  b  h  r"  e  o  f  1. 1"!  e  s  i  ;>•:  <-:•  <::<  f  t  h  b  p  o  p  i.i.  1  a  1 1.  o  r-i  f/)  e  vi  a  n  t .. 
to  E-tLudy  rather  than  the  actual  size.,  In  order  to  maintain  the 
representativeness  of  the  sample  we  need  to  adjust  the  weights  for 
the  individual  data  observations  tor  ditterenc€?s  between  the  mea- 
sure of  size  used  to  select  a  particular  sample  element,  and  the 
actual  probability  of  selection  as  determined  after  we  acquire 
more  information  about  the  sample  units.  At  the  same  time,  in  or- 
der to  obtain  population  totals,  each  element  in  the  sample  must 
be  multiplied  by  the  inverse  of  its  selection  probability,  i.e„ 
expanded,  so  as  to  reflect  the  fact  that  the  sample  repi resents:.  a 
larger  number  of  households  in  the  population.  These  factors  are 
all  incorporated  into  a  weight  which  is  attached  to  each  household 
record   and   applied   to  values  that   need   to  be  extrapolated. 

No  matter  how  many  stages  are  used  in  the  sample  selection 
process,  the  extrapolation  factor  or  weight  attached  to  a  selected 
sample  element  is  the  reciprocal  of  that  element's  probability  of 
selection.  An  element's  probability  of  selection  is  simply  the 
product  of  its  probability  of  selection  at  each  stage.  Where  sam- 
pling of  all  clusters  within  a  substratum  is  with  probability  pro- 
portional to  size  <pps) ,  and  the  number  of  sample  units  drawn  at 
each  stage  after  the  first  is  constant  across  all  elements  in  that 
stage,  ail  elementary  units  within  a  substratum  have  an  equal 
probability  of  selection  and  the  sample  is  approximately 
sel f -wei ghti nq .  Using  notation  similar  to  that  used  by  Megi 1 1  and 
Dauphin  (1936)  for  the  Haiti  survey,  this  can  be  seen  for  a  three- 
stage  sample  from  the  following: 

Hh,     M^^l        m^   M»-,t  J  Cm  ,  ,  , 

P^i..^v.    =   —     — (1) 

where: 

Phijk   =  probability  of    selecting   the   k"^^   household   in   the  j^"- 
second-stage  sample  unit    in   the   i  ^^•^   first-stage  sample 
unit   in   substratum  h,    constant   for   all    households   in  j. 

n^^   =  number   of   first-stage  sample?  units  selected  from 
substratum  h,    variable  from  h   to  h. 


§a.Z.^.rit:M  est  innate  of    the  household   population   of    the  i. 
t  i"r  it-stage  E-ample  urn.  t    i  ri   BubEitratum  h 

§ >\ZJ^yi%M.  e E  t  ;i.  ni c' t  e  o  f   t  h e  I'l tj u s e f  i o  1  c i   p:< o p u  1  a t  i  c ? n   of    s u - 
stratum  h 

number   of   second~Bt. age  units   selected   from  the  i*'''' 
first-Btage  sample  unit   of    subs>tratum  h,    constant  for 
all    i    within  sub s t r  at um  h . 

^>'^~3-D-fj^  estimate  of  the  household  population  of  the 
J^"*^  second --stage  sample  unit  in  the  i*"*^  first-stage 
sample  unit   of   'i:u b s t r  a t u m  h 

listed  number  of  households  in  the  j*-*^  second-stage 
sample  unit    in   the   i        first-stage  unit   in   substratum  h 

number   of   households  selected   in   the   j*"'"'  second-stage 
sample  unit    in   substratum  h,    constant   for   all    j  within 
substratum  h. 

The  first  term  in  equation  (1)  is  the  probability  of  select- 
ing a  given  first-stage  sample  unit  in  substratum  h.  The  second 
term  is  the  probability  of  selecting  a  second  stage  unit  given 
that  the  i  first-stage  unit  has  already  been  selected.  The  last 
term  is  the  probability  of  selecting  a  household  within  the  se- 
lected second— stage  sample  unit.  Where  the  number  of  households 
actually  listed  in  the  selected  second-stage  sample  unit,  M  »-ii  i, 
equals  the  ex -ante  estimate  of  the  household  population  of  the 
unit,   Mhii,   equation    <1)    reduces  to: 

m^^ 

P  H  i  _1  v.:         =    ^  2  ^ 

This  means  that  if  a  constant  number  of  second-stage  units- 
are  selected  within  each  selected  first -stage  unit,  and  a  constant 
number  of  third-stage  units  are  selected  within  each  selected 
second-stage  unit,  using  pps  in  the  first  two  stages  and  equal 
probability  at  the  last  stage,  each  household  within  the  substra- 
tum has  an  equal  probability  of  selection.  The  sample  is,  conse- 
quently, self-weighting.  When  the  factors  M'mij  and  Mmj  are  not 
equal  but  are  close,  the  sample  is  only  approximately 
sel f -wei ghti ng -  Where  a  two-stage  sample  rather  than  a  three- 
stage  sample  is  used,  the  term  m^  drops  out  and  each  unit  still 
has     an     equal      probability     of      selection.  In     both     cases  if 

first-stage  sample  elements     are  allocated  between  substrata  with 


m»-,  - 


pps,      i?,nd     this  fTiM   ..(.nd   Cr-,   ai"e  equaJ    bet-^een     t.ho-.-e     Bubstraha,  the 

B  iB  iTi  |::<  I  e   i  -.e-   b  e  i.  f  e- 1  g  h  1 1  n  g   wit  hi  :i  n   a   b.  t  r-  a  t  u  nt .         The   b  a  rn  e   i  b-  t  rut?  for 

the  main   strata.        However,    it    i  b-  not   LiBuaJ.lv   advantaqeouB   to  nial-e 

a  proportional    allocation   at   that    level    because  ot   variability  in 

the     data     and   BubB-tantial    di  t  f  er  enceB-   in   population     between  the 

various-     strata,      especially  between   urban   and   rural    and   high  and 
low   income  strata. 

Where  there  is  reason  to  believe  that  there  is-  greater  vari- 
ability between  strata,  obtaining  the  most  cost-et t ec t i ve  esti- 
mates of  population  estimates-  requires  optimal  allocation  ot 
sample  units  between  the  strata =  Essentially,  this  means  allocat- 
ing sample  obs-er  vat  i  ons-  to  the  various  strata  as.  a  function  of 
variability  within  each  s-tratum  and  the  cost  of  interviewing  a 
sample  unit  in  that  stratum.  At.  the  same  time,  when  separate  es- 
timates- with  a  minimum  precision  are  needed  for  each  stratum,  a 
minimum  sample  size  would  have  to  determined  for  each  stratum  on 
its  own.  Merely  allocating  the  overall  sample  to  the  strata, 
whether  by  optimum  allocation  or  proportional  allocation,  would 
not   guarantee  the  required   level    of  precision. 

In   practice  M'mj   will    not   equal  Population   will  have 

changed  since  the  last,  measure  no  matter  how  recent  it  is.  More- 
over, when  a  listed  housing  unit  is  vis-ited,  the  enumerator  will 
sometimes  find  more  or  fewer  households  than  was  anticipated  on 
the  basis  of  the  household  liE-tinq.  To  be  rigorous,  the  weights 
attached  to  those  households  actually  interviewed  within  those 
last  stage  sample  units  need  to  be  adjusted  to  reflect  the 
mod  i  f  i  ed   p  r  OD  a  bilitv  of  selection. 

In  the  case  of  finding  additional  households  where  there  was 
thought  to  be  only  one,  one  of  the  households  should  be  selected 
for  the  interview  and  its  weight  multiplied  by  the  number  of 
household     units     identified.  This  corrects  for   the     fact  that 

there  are  more  households  of  this  type  in  the  population  than  was 
foreseen   in   the  original  sample. 


Where  a  lifted  household  nc  longer  exists  '-is  vacant..  de- 
stroyed or  is  seasonal)  ^t  tlie  t  j.  roe  or  the  interview  and  a  new 
h  o  Li  s  e  h  o  1  d  hi  a  s  b  e?  e  n  s  li  b  s  1. 1 1  ia  t .  e  <:i  in  ard  e  r  t.  a  <  n  a  i  r  1 1  a  i  n  t  hi  e  1  e  v  e?  1  c<  f 
precision  desired  for  the  est  i  (TiateG-:- ,  the  weights  of  al  1_  o-f  the 
other  households  interviewed  within  the  second-stage  unit  need  to 
be  adjusted.  The  adjustment  would  retlect  the  -fact  that  there  are 
•fewer  households  in  that  cluster  than  was  anticipated.  The  ad- 
justment  factor   would  be: 


l:, 


•j. ; 


where : 

dj-,i..i    -  number   of    ineligible   sample  households  in  the 
j*""'   second -stage   sample  unit   in   the   i  first- 
staqe  sample  unit    in   substratum  h 


The  total  weight  of  each  household  is  then  its  initial  probability 
of  selection  multiplied  by  the  appropriate  adjustment  factors.  In 
cases  where  the  ineligible  households  are  not  replaced  in  the 
sample,    such   a  weight  reduction   is  not  necessary. 

Other  ad jusiiment s  in  the  weicjhts  may  be  necessary  to  allow 
for  non-response  and  consolidation  of  households.  The 
non-response  adjustment  is  required  when  fewer  than  households 
are  interviewed  in  the  cluster  despite  substitution  efforts.  The 
adjustment  factor  is  simply  Ch  divided  by  the  number  of  completed 
interviews  within  the  cluster  j  and  is  multiplied  by  the  weight  of 
each  interviewed  household  in  the  cluster.  Since  this  adjustment 
can  introduce  bias  it  should  not  be  used  when  there  are  more  than 
25—30/1  m i  ssi  ng  un  its. 

The  adjustment  for  consolidation  is  necessary  when  an  inter- 
viewed household's  probability  of  being  interviewed  was  augmented 
because  it  occupies  more  than  one  housing  unit.  This  occurs  when 
one  household  occupies  the  same  space  that  was  occupied  by  two 
different     households  at   the  time  of    listing.        As  a  consequence, 


-26- 


its  probability  of  selection  has  doubled  si ncF  it  will  De  rncled 
it  E^ither  one  of  the  two  on  ri  in  ally  1  i  stez-d  households  are  se- 
lect e  d  „  It  is  t  h  e  r  e  1 1 )  r  e  n  e  c  e  s  s  a  r  y  t  a  <ti  u  1 1 1  p  j  y  1 1 1  e  i. ;  e  i  q  h  t  of  this 
i  n  t  er  v  i  ewed   h  ou  seh  o Id   b  y  on  e~ h  a 1 t  » 

Unless  there  is  some  reason  to  suspect  that  significant  demo- 
cjraphic  changes  have  occurred  since  the  data  used  to  construct  the 
sampling  frame  were  originally  collected,  the  bias  introduced  by 
ignoring  the  adjustments  for  missing  and  unanticipated  sample 
units  will  sometimes  not  be  great  in  relation  to  total  sarripling 
and  non~sampling  errors.  In  such  cases  it  would  be  reasonable  to 
assume  that  such  phenomena  occur  at  random  throughout  the  popula- 
tion. Simply  replace  a  missing  household  with  one  of  the 
substitutes.  Where  there  is  more  than  the  one  sample  unit,  choose 
one  -for  the  interview  and  proceed  as  though  nothing  occurred  In 
both  cases  the  sample  size  will  remain  the  same  constant  size, 
preserving  the  sel + -wei ght i ng  sample  and  keeping  work  schedules 
intact.  At  the  same  time,  make  note  of  these  occurrences  in  the 
event  that  a  subseguent  tabulation  by  computer  permits  use  of 
variable  weights.  Variable  weights  can  be  attached  at  that  time. 
Of  course,  users  of  the  data  should  be  in-formed  of  any  such  short- 
cuts that  could  compromise  the  usefulness  of  the  data  for  their 
particular  analytical  purposes.  Where  a  data  base  is  expected  to 
be  used  by  a  number  of  analysts  for  as  yet  unidentified  purposes, 
it  would  be  best  to  make  every  effort  to  calculate  the  correct 
weights. 

In  addition  to  weights  associated  with  sampling  probabilities 
■for  the  selected  elementary  sample  units,  each  elementary  unit  re- 
guires  a  weight  to  extrapolate  recorded  expenditures  to  the  ap- 
propriate time  interval.  This  weight  will  depend  on  the  data 
collection  methodology  employed  -for  the  survey.  To  estimate  an- 
nual consumption  and  expenditures,  the  extrapolation  factor  would 
be  365/r  where  r  is  the  reference  period  covered  by  the  survey  \in 
days).  This  factor  will  normally  vary  for  different  types  of  ex- 
penditures within   the  same  survey,      as  well    as  for   di-fferent  sur- 


veys,:        Far  .i  o-'    !  i  m  fpr  tf k  c--.f:: i  iP^-Hi:   r    w'..<v.<  1  ci   n':<r      i  !.  v  bt::  r  al.hf:^r 

c  a  1_  LDJtMr.Siri£^.. 

When  done  vi.o.  a  cumulative  popuiation  l.i.  Btingi  systemat.  i  <: 
s  a  m  p  I  i  n  g  o  f  p  b  u  b  i  b-  e  cj  u  i  v  a  i  e  n  t  t  c;)  b.  a  m  p  i  :i.  r"i  q  w  :i.  t  h  a  u  t  r  e  p  i  £a  c:  e  m  e  ri  t , 
Generally  speaking,  Bamplinq  without  r  ep  1  aceicien  t  i  b-  more  precise 
than  Bamplinq  with  replacement.  When  the  Bamr.:<ling  fraction  is 
Bmai  1  H  the  chance  of  the  B-ame  unit  be^inq  B-elected  twice  is  alB-o 
Email.  In  thiB-  caB-e  Bamp:;linq  with  retplacement  i  b-  practically 
e  qu  1  V  a  1  e n  t  t  o  s  a  m  p  1  i  n  g  w  1 1  h  o  u.  t  r  e  p  1  a  c  e ni e? n  t  „  Pre  q  u  e n  1 1  y  ,  h  o  w e  v  e  r  , 
especially  when  using  B-amplifiq  trameB-  basi-ed  on  administrative 
units.,  the  t  i  r  st-st  age  s-ampling  fraction  will  be  quite  high.  In 
this  case  it  will  be  necesB-ary  to  introduce  the  finite  population 
correction  factor  (fpc)  to  avoid  overestimating  the  variance  when 
sampling   is  done  without  replacement.'- 

To  maintain  the  widest  range  of  flexibility  -for  estimating 
parameters  and  their  variance  we  will  assume  that  the  sample  is 
only  approximately  sel f -wei ght i ng .  This  meane  that  estimates  will 
be  biased  in  proportion  to  the  error  m  the  measures  of  size. 
However  the  variance  of  the  estimates-  proposed  here  will  often  be 
substantially  smaller  than  with  many  unbiased  estimates  under  the 
same  circumstances.  Moreover,  the  bias  in  the  estimates,  will  not 
be  large  unless  the  measures  of  size  are  substantially  in  error. 
This.  will  not  usually  be  the  case  with  ceneus  or  administrative 
d  at  a . 

Turning  first  to  estimating  totals,  means  and  proportions, 
again  using  notation  similar  to  Megiil  and  Dauphin  (1986), the  sur- 
vey estimate  of   a  total    would  be: 


^  The  finite  population  correction  factor  for  first-stage  units 
is  i  -  n/N,  where  n  is  the  number  of  first-stage  units  sampled 
and  N  is  the  total  number  of  first-stage  units  in  the  population. 
The  fpc  factor  is  always  less  than  one  and  always  reduces  the 
var i  ance. 


-28- 


where: 


Xfy  ~  total   of    variable   X   for   group  A 

A  =  Butaset   of   records  belonging  to  group  A 

W,,  =  final   weight   for   the  k-^'^-  record 

X,,  =  value  of   variable   X   for   the  record 

To  calculate  means  and  proportions  we  use  the  ratio  estimate 
Y   /  X        where   Y      and    X  ^-^    are   the   c  o  r  r  e  s  p  o  n  d  i  n  q    t  o  t  a  1    e  s  t  i  m  a  t.  e  s  , 
According   to  Meg  i  1 1    and   L>auphin  Q93v':3)s 

"...Means  and  proportions  are  special  types  of  ratios. 
In  the  case  of  a  mean,  the  variable  X  in  this  ratio 
would  be  equal  to  1  for  each  record  in  group  A,  so  that 
the  denominator  would  equal  the  sum  of  the  weights  for 
group  A.  In  the  case  of  a  prcportiDn.H  the  variable  Y  in 
the  numerator  of  the  ratio  would  be  either  1  or  O  de- 
pending D  n  t  he  p  r  e  s  e  n  c  e  o  r  a  b  s  e  n  c  e  o  f  a  specified 
characteristic."  <p.7) 

The  term  Vl^.  in  equation  (4)  can  be  a  true  final  weight,  i.e. 
one  t  hi  a  t  i  n  c  1  li d  e  s  a d  j  u s  t  m e n  t  f  d r  m  i  s s  i  n  g  or  additional 
households,  or  a  simple  extrapolation  factor  based  on  the  initial 
probability  of  selection  for  household  k.  In  the  former  case, 
is  an  asymptotically  unbiased  estimate  of  the  population  total  of 
X-  In  the  latter  case  X--,  is  biased  and  the  bias  is  proportional  to 
to  the  errors  in  the  .  As  mentioned  previously,  unless  there  is 
reason  to  believe  that  there  have  been  substantial  population 
shifts    since    the    papulation    figures    used    in    the    sampling  frame 


t^'^ e  r"  e  q  c<  t  !i  e  r"  e tJ  ,  t  h  i  ?t  d  i  a  s  j  b  p  r  cj  t; <  a  !;;•  1  v'  t" i  o  t  <::]  u  a  n  t  1. 1:  a  t  :i.  v  e  1  y  ;b  u  b  e>  t  a  n  - 
ti.  ai  in  relation  to  non-'samp  i  i  nq  error  and  can  be  ignored.  This 
permits  the  data  to  be  tabulated  as  a  simple  random  sample  with 
e>;  tr  apol  at  1  on  factors  applied  only  once,  to  the  sample  totals- 
Estimating  variances  is  considerably  more  difficult.  The 
Haiti  survey  used  SUPER  CARP,  a  mainframe  generalized  variance 
software  package  developed  at  Iowa  State  University,  and  the  cor- 
responding microcomputer  version,  PC  CARP.  SUPER  CARP  provides 
for  the  calculation  of  variances  for  estimates  of  totals,  means, 
proportions  and  other  ratios.  Apparently,  the  software  does  not 
provide  a  finite  population  correctic<n  factor  for  multistage  sam- 
pling with  first-stage  selection  probabilities  proportional  to 
size.  Whether   or   not   this  is  a  problem  will    depend  on  the  pro- 

portion of  first-stage  units  included  in  the  sample.  In  most  cir- 
cumstances this  will  be  less  than  10/1  so  that  ignoring  the  fpc 
factor  will    not  make  a  great   difference  in  the  results. 

CLUSTER  (Verma  and  Pi  er  ce ,  1 97S )  is  anothE^r  program  that  esti- 
mates variances  of  multistage  samples  and  cluster  samples.  Annex 
A  provides  equations  for  calculating  variances  from  two-stage  and 
three-stage  Ei-amples  select.e?d  with  probabilities  proportional  to  a 
measure  of  size  for  those  who  do  not  have^  access  to  a  canned  pro- 
gram for   calculating  variances. 


Mil.      CONCLUSION  AND  RECOMMENDATIONS 

In  most  countries  data  on  census  enumeration  areas  are 
readily  available.  Census  enumeration  areas  are  uniform  in  size, 
have  a  relatively  known  population,  are  small  enough  that  they  can 
be  easily  grouped  into  relatively  homogeneous  strata  and  can  usu- 
ally be  covered  by  one  enumerator  on  foot.  Where  enumeration  ar- 
eas are  selected  with  probability  proportional  to  size  and  the 
sample  is  sel f -wei ght i ng ,  relatively  simple,  two-stage  estimating 
procedures  apply.  In  this  case  variances  can  be  calculated  with  a 
hand  held  calculator. 


The  pr  1.  nc  1  p  £h  1  drawback  of  using  EAb  as  ample  units  is  the 
time  involved  in  obtaining  maps  or  definitions  of  their  boundaries 
and  obtaining  the  list  of-  elementary  sample  units  in  the  selected 
EAs.  Since  EA  boundaries  do  not  always  follow  neighborhood  or  po- 
litical boundaries,  special  care  is  reguired  to  insure  that  the 
unit  is  properly  defined.  Some  countries  maintain  detailed  maps 
and/or  clear  definitions  of  the  boundaries.  Many,  however,  do 
not.  Even  where  they  do,  they  do  not  maintain  an  up-to-date  list 
of  households.  Consequently,  it  is  virtually  always  necessary  to 
do  an  on-the~gr ound  listing  of  elementary  sample  units  before  be- 
ing able  to  proceed  with  sampling  within  the  selected  EAs.  Before 
deciding  to  use  census  enumeration  areas  as  first-stage  sample 
units  in  a  rapid  appraisal  approach,  researchers  should  verify 
that  sufficient  information  is  available  to  ensure  quick  and 
proper    identification   on   the  ground. 

In  contrast  to  census  enumeration  areas,  good  population  data 
are  almost  always  available  on  administrative  units  down  to  the 
lowest  levels.  Lists  of  households  within  the  lower  level  units 
are  usually  available  at  local  government  headquarters,  often 
eliminating  the  need  for  an  on-qround  listing.^-  But  administra- 
tive units  have  their  downside  too.  The  size  and  number  of  the 
lower  level  units  likely  to  be  used  as  ultimate  cluster  sampling 
units  is  not  usually  known  in  the  capital  city  or  in  regional 
capitals.  This  will  normally  necessitate  the  use  of  three  or 
more  stages  of  sampling  and  require  more  complicated  estimating 
procedures. 

Ideally,  the  first-stage  sample  units  in  an  administrative 
type  sampling  frame  would  be  the  lowest  level  administrative  unit 
for  which  population  data  are  available  at  central  levels.  The 
lowest  unit  for  which  the  total  number  of  units  in  the  overall 
population      is     known   would   then   serve  as     the     ultimate  cluster. 

■^Admlnritrati ve   lists  frequently   are   incomplete.      But   on-qround   1 i sti ng 
is  not   always  better.      Outside  enumerators  overlook   sample  units  too, 
or   rely  on   the  recall    of   others  who  do  the  same. 


ThiE  will  i.nEure  that  data  ne^eded  inv  the  fjnite  population  cor- 
rect i  o  n  t  a  c  t  a  r  ^-^.i  i  1  J.  b  e  a  v  a  i  ].  a  h  i.  e  „  ;•:•)  a  m  p  1  in  g  e  1  c>  w  t  hi  a  t  1  e  v  e  J.  w  o  u  .1  d 
b  e  u  B  e  d  o  1  y  t  o  p  r  o  v  i  d  e  u  n  b  i  a  e  e  d  e  b  t  :i.  en  a  t  e?  s  o  f  t  h  e  u  1 1 1  m  a  t  e  c  1  ti  e:.  t  e  r- 
v£-<lueE.  uE-ed  to  make  population  estimates.  Individual  household 
data  would   still    be  available  for   analyses  requiring   such  data. 

This     report     firesents  procedures     and  estimators     for  using 
both     census  enumeration   areas  and   administrative  units  as  a  sam- 
pling    frame  for   gathering  household  consumption     and  expenditure 
data.      Other   approaches  could   be  used   as  well.      Each   approach  will 
have  a   separate  set   of   estimators-      Which   one  to  use   is  a  question 
of     the  cost-effectiveness  of   the  estimates.        Stratified,  multi- 
stage sampling  with  probabilities  proportional    to  size  in  the  con- 
text    of      a     sel f -wei ght i nq   sample  normally     provides     very  cost- 
effective  estimates.        Sampling  with  probabilities  proportional  to 
a     measure  of   estimated  size  rather   than   actual    size,      the  normal 
situation  under   field  conditions,      provides  estimates  that   are  al- 
most  as  efficient,   but   somewhat   biased.        As  indicated  previously, 
the  amount   of   bias  will    not   be   large  unless  the  measure  of   size  is 
substantially     in  error.        Even  this  bias  can  be  reduced   if  indi- 
vidual     sample  element   weights  are^  adjusted  to  reflect   the  diver- 
gence    between     the  measure  of    size   actually  used   and     the  actual 
size     found      at   the  time  of    listing.        Before  such      an  adjustment 
makes     sense,      however,      researchers  must   satisfy  themselves  that 
such     differences     are  occurring  because     of     population  dynamics 
rather      than   because  of   an   imprecise  definition   of    sampling  clus- 
ters at  two  different  points  in   time.      In  practice,      this  will  not 
be  easy a 

As  this  report  has  shown,  it  takes  little  additional  time  to 
develop  and  utilize  a  statistically  rigorous  sampling  methodology. 
The  major  cost  is  in  terms  of  additional  visits  to  a  larger  number 
of  primary  sample  units  and  the  political  and  logistical  prepara- 
tion that  this  requires.  This  is  usually  an  insignificant  cost  in 
relation  to  total  survey  costs.  Vet  a  serious  effort  to  develop 
and     apply  an  efficient   and  rigorous  sampling  methodology  greatly 


I 


e  :■ ;  p  a  n  r:i  5  -  the  a  ri  a  J.  v  i:  i  c:  a  i  u  s  e  1 1. 1 1  i  i  e  s  5^  a  r  i  ( 1  t.  h  e  p  r  ;<  x  e  i » 1. 1  a  1 
t-vance  o-f    con nump t  i  on   and   e;; pend :i  t. ur  e   surveyE  - 


p  r.::<  J  i  c  v     r  e  1  ~ 


The  specific:,   recommendat  i  ons  of    this  report    include  the  tol- 
1 owi ng : 

1>  The  Nutrition  Economics  Group  should  finance  a  combination 
consumption  and  e>;penditure  survey  that  includes  a  rigorous 
examination  of  the  design  effect  and  the  effect  on  data  qual- 
ity of  various  interview  frequencies  aimed  at  developing  es- 
timates valid  for  pjopuiation  groups.  The  study  should 
include  a  comparison  of  single  interview,  quarterly  interview 
and  monthly  interview  appjroaches  in  an  urLtan  and  a  rural  en- 
vironment, 

2>  Consumpti  on  /  e>;  pend  1 1  ur  e  surveys  should  use  statistically- 
valid,  sel  f  ~wei  r;l  it  i  ng  ,  stratified,  multistage  probability 
sampling  whePc^ver  possible.  This  will  greatly  simplify  cal- 
culation of  variances,  reduce  costs  and  help  contain 
non-sampling  errors.  The  sampling  methodology  should  include 
the  foil owi  ng : 

-  Use  of  census  enumeration  areas  or  the  lowest  level 
administrative  units  for  which  population  data  are  cen- 
trally  available  as  the  primary   sample  units. 

-  Grouping  of  primary  sample  units  into  strata  of 
relatively  homogeneous  sample  units. 

-  Structuring  the  sample  so  that  there  will  be  a 
m 1 n  i  mum  of  25-30  p r i mar  y  samp 1 e  un 1 1  s  se 1 ec  t  ed  within 
each  stratum. 

-  Selecting  sample?  clusters  at  all  stages  with  prob- 
ability proportional  to  estimated  size.  At  all  stages 
except  the  first  draw  a  constant  number  of  sample  units 
within   each  stage. 

-  The  size  of  the  sample  drawn  at  the  last  stage  of 
sampling  should  be  S-10  units  per  cluster.  A  larger 
sample  size  per  ultimate  cluster  would  be  called  for 
where  the  underlying  variability  in  the  population  with 
respect   to. the  parameter   to  be  measured   is   large.  Gen- 


era. 1. 1  y  ,  nowt*ver  ,  a  .!.  ai  qer  niuiiber  of  pr  j  mar  y  'samp  1 1--  un:i  Is 
1 'i!  pr  eferr  a  b.l  e  to  a  .larger  niuuber  of  sample  elements  p»er 
c  1  ust.  er    f  <:;«r   a  q  i  ven   samp  i  e     i  z  e « 


-34- 


Hhlers,  Thf?odore  H.  ,  et .  ai  .  X^'^'BZ.  "Le  Sect.eur  Rizicole  a  t'ladaqaE.- 
car:  Etude  du  Sectei.'.r  Rizicoie,  Rapport  de  la  F'remiere  Phase". 
Service  de  Coordination  du  becteur  Riz,,  Ministere  de  la  Production 
Aqricole  at  de  la  Retorme  Aqraire  and  h-E;  boc  i  at  es  for  International 
L)evel  opment .    Somer  vi  1  1  e . 

Cochran,  William  G.  1953.  Sampl i  nq  Techn i queE .  John  Wiley  and 
Son  E- ,    Inc.    New  Vor  k  . 

Freudeheim,  Nancy       Johnson        and        Robert  Wardrop;  1987; 

"Ml 5cl assi t i cati on  of  Nutrient  Intake  of  Individuals  and  Groups 
Using  One,  Two,  Three  and  Seven  Day  Food  Records";  Amen  can  Jour- 
D_§i_  Qil  Epi  di  mi  ol  oqy  .,    Vol.    126,    No.  4. 

Hansen,      Morris  H.,    William  N  Hurwitz    and   William  G.    Madow.  1953. 
Samp 1 e  Survey  Methods  And   Theory .      John   Wiley   and   Sons,      Inc.  New 
Yor  k  . 

Hiemstra,  Stephen  J.  1987.  "Urban  Food  Consumption  Patterns  And 
National  F"ood  Policy  In  Liberia:  Methodology  And  Evaluation".  Nu- 
trition  Economics  Group,    USL>A   and   F'urdue  University. 

Meqill,  David  J.  and  Marjorie  M.  Dauphin.  19S6.  "Estimation  Proce- 
dures For  The  Haiti  Household  Expenditure  And  Consumption  Surveys: 
Weighting  Procedures  and  Calculation  of  Variances".  U.S.  Bureau  of 
the  Census.    Washi ngton , D. C. 

Megill,  David  J.  and  Marjorie  M.  L)auphin.  1935.  "Preliminary  Rec- 
ommendations for  Sample  E)esign  of  Household  Expenditure  Survey  in 
Haiti".    U.S.    Bureau  of   the  Census.    Washington,  E>.C. 

Senauer,  Ben.  1987.  "Rapid  Appraisal:  The  Survey  Instrument".  Re- 
port  Prepared   for   the  Nutrition   Economics  Group,  USDA. 

Spurr ,  William  A.  and  Charles  P.  Bonnini.  1967.  Stati  st i cal 
Ana 1 ysi  s  For   Busi ness  Dec i  si  ons .    Richard   D.    Irwin.    Home wood . 

Verma,  Vijay  and  Mick  Pearce;  1978;  "User  s  Manual  For  CLUSTERS:  A 
Package  Program  For  Computation  Of  Sampling  Errors  For  Clustered 
Samples";  International  Statistical  Institute,  World  Fertility 
Survey;  London, 

Yates,  Frank.  i960.  Sampl i  nq  Methods  For  Censuses  And  Surveys. 
Hafner   Publishing   Company.    New  York. 

Zalla,  Thomas  M.  1932.  "Economic  and  Technical  Aspects  of 
Smallholder  Milk  Production  in  Northern  Tanzania."  Unpublished 
Ph.D.  dissertation.  Department  of  Agricultural  Economics,  Michigan 
State  University.    East  Lansing. 

Zalla,  Tom,  1987.  "Toward  Rapid  Appraisal  of  Consumption  and  Ex- 
penditure Data".  Report  prepared  for  the  Nutrition  Economics 
Group,  USDA. 


I 


ANNEX  A 

VARIANCE  FORMULAS  FOR  A  STRATIFIED,   MULTI-STAGE  SAMPLE 
WITH  SELECTION  PROBABILITIES  PROPORTIONAL  TO  A 

MEASURE  OF  SIZE 


UEing  an  ultimate  cluster  type  of  estimator,  and  including 
the  PRC  factor,  equation  (5)  gives  an  estimate,  slightly  biased, 
of  the  variance  of  a  total  (equation  4  ,  page  2 9 >  for  a 
se 1 f -we i qht 1 ng  (within  strata)  ,  three  stage  sample  selected  with 
probabilities  proportional  to  a  measure  of  size  at  the  first  two 
staaes: 


1  n         n         nj_,  •••  .aj_., 

Var    (X)    =     E     W^^    <i-  — ■>    -       2    (Xhi.j    -   — <5) 

h=l  Oh     n-l      i=l    j^l  "  n 


where: 

h,    i,    j,    and  k   refer  to  substrata,   f i rst- stage ,  second-stage 
and   third-stage  sample  values; 

Mh  !i    n^.,   and  mi_,   are  defined   as  for   equation  (1) 

1     =  number   of   substrata   in  the  population 

L-i,_.     =  Mh/M  =  the  weight   of   substrata  h    in  the  total  popu- 


lation wh ere  M  i  s  t h s  h ouseh u 1 d  p op u 1 a t  i  o 
all    substrata  combined 


n  o 


LV.     =  number   of   second-stage  units    (ultimate  clusters.)  in 
5  u  b  s  t  r  a  t  u  m  h 


n 


□  h  iTih  =  number   of   second-staqe  units  (ultimatt- 
clusters)    selected    in   substratum  h 


n 

1_  —  =  jij^j-j^.^^  population  correction  factor  for  second- 

Qi_,  stage  sample  units    (ultimate  clusters) 


A-1 


X|_.  :  j      =      E        W)..  ,         X|,  :  ;  j.    "  Weighted   tota.  i    nt    var  i  c<.b  1 1;^  X 
h;  ~  i.  f  o  r    the    j  •   •    s  e  c  o  r  i  d  -    t  a  q  e    i.i.  n  i  it 

i  r""i      1. 1  b  "5 1.  r  a  t.  i.i  <rt  ht 

where: 

i  .  k      ~  final   vMeiqht   for   the  k        sample  unit  ir 


the   _T'»-^   Becond-st age  unit   of    the  i"^*-- 

first-staqe  unit    in  substratum  h 


i  ,  k     -  value  of   variable  X  for   the  k+t"-  sample 

unit  in  the  ._t^'~>  second-Btage  unit  in  the 
j^+K  first -stage  unit   in  substratum  h 

n^^     mK  •• 

Xk     =       E       E      Xj._.  ;  ;    =  weighted   total   of    X   for   sub  stratum  h 
i=l  j-1 

1 

X     =       E     X}..  =  weighted   total    of    X   for   the  population 

h=:l 


The  reader  will  notice  that  (5)  is  just  the  weighted  sum  of 
the  variances  of  the  n  second-stage  weighted  sample  totals  in  each 
uf  the  1  substrata.  UJhere  the  number  of  sample  units  is  constant 
within  each  cluster  in  a  substratum,  so  that  individual  household 
weights^  are  ^tlso  constant,  it  can  be  calculated  manually  without 
t DO  much  d  i  f  f  i  c u 1 1 y  » 

Where  two  stage  sampling  is  all  that  is  needed  the  estimated 
variance  of   a  total   would  be: 


1  nn        ni_.  n^.. 

Var    (X)    =     E  d"  — )    2  ,    --  -  -)'^,  <6) 

h=l  N       n^-l      i=l  n^ 


lAssuminq  adjustments  for  missing  or  duplicate  households  are  ignored. 


I 


wher  e 


N      -   nLinibt::-r    of    f:i.r":5t  Btage   LiiiilB    ^ulti/iiate   clusters)  in 

Bub stratum  h 

(1  -••-)    =  the  finite  population  correction  factor  for  first- 

N  stage  E-ampie  units 

Ch 

Xk  ;     ~     S  W)-v  i  i    '"'h  i  i       weighted  total   of   variable  X 

j~l  for   the   i+i-'   first --stage  unit 

i  n  3  Li  b  s  t   a  t  l;  m  hi 

riK  ■• 

~     E  X^.  j    =  weighted  total   of    X   for   substratum  h 
i=--l 


For  the  variance  of 
timates  that  are  rather  complex, 
times  more  biased  but  easier  to 
stage  sample  selected  with  pps: 

1  n 
Var    (X)  E  ^l" 

h=l  Qh 


ratios   SUPEiR   CARP   provides  es- 
Equation     <  7)     provides    a  some- 
calculate    estimate    for    a  three 

m^- 

s'(Xk.,    "  Xh)=.  (7) 
1  =  1 


means  and 


n 


n  -  i      i  =  1 


wheres  C 


V 


h 


   -  weighted  mean  of   the  j-^^'-  second- 

Ch  stage  unit    in   the   1+^--  first- 


k  =  l 


UJ.  •  :  :  £ t a q e  u nit   of   s u b s t r  a. t  u m  h 


Ev  y 

i=l  j=l 

  =  Weighted   mean   for   substratum  h 

nj_. 


E  q  \..\  a  t.  i.  o  ri    ( G . 


9  :•• 


an   estimate   for'  tl" 


V  cv.  r  1  B.  n  c  e 


::.f 


a   rneari  lor 


the  5ame  kind  of   two-Btaqe  samp:'lE 


Var  (X) 


h  =  l 


Hi 


N 


ri 


V:  i 


<8> 


i  =  l 


where; 


Xu.. 


j-1 


Wk  i  i 


w  e  i  g  h  i  t   ( j  r  n  e  a  n  o  f    v  a  r  i  a  b  1  e  X   f  o  r 

t  !"i  e   i  +     f  i  i~  E- 1  -  5 1  a.  q  e  b  a ni p  1  e  li  nit   i  n 

E-ubstr  atum  h 


i-1 

  -     weighted  mean  of   variable  X   for  sub- 


A~4 


ANNEX   B:  GLOSSARY 


Accuracy:  In  a  Bt.  at  i  st.  i  cai  EE-nE-e  accuracy  refers  to  how  cJ.  OBeiy  an 
estimated  value  approximates  the  true  (and  unknown)  population 
valuen      Accuracy   includes  the  G-?ttect   of   both   precision   and  bias. 

Area  Sampling:  A  method  of  sampling  in  which  sample  elements  are 
defined  as  geographical  units.  Normally,  the  areas  to  be  sampled 
are  defined   so  as  to  be  approximately  equal    in  size. 

Asymptoti cal 1 y  Unbiased:  A  characteristic  of  an  estimator  such 
that  it  becomes  less  biased  as  the  sample  size  approaches  the 
population  size. 

Bias:    A  persistent   tendency   to  err    in   a  particular  direction. 

Cluster:  A  grouf:<  of  sample  units  that  are  closely  associated  in 
space . 

Cumulative  Population  Listing:  A  method  of  listing  sample  units- 
such  that  the  population  of  each  successive  sample  unit  is  added 
to  the  total  of  previous  sample  units,  and  the  interval  occupied 
by  each  sample  unit  is  noted.  Thus,  each  sample  unit  occupies  a 
unique  interval    on  the  list   of   total    sample  units. 

Design  E-f-fect:  A  measure  of  the  extent  to  which  clustering  sample 
observations  increases  the  size  of  a  sample  relative  to  a  simple 
random  sample  that  is  required  to  obtain  the  same  level  of  preci- 
sion  as  a  simple  random  sample. 

Finite  Population  Correction  Factor  (FPC) :  A  number  that  reduces 
the  estimate  of  a  variance  according  to  the  proportion  of  popula- 
tion units  included  in  the  sample.  The  greater  the  proportion  of 
the  population  included  in  the  sample,  the  smaller  is  the  FPC  and 
the  resulting  estimate  of  the  variance.  The  FPC  always  reduces 
the  estimated  variance,  though  not  by  much  when  the  population  is 
large  relative  to  the  sample- 
First  -Stage  Sample  Unit:  The  population  unit  selected  at  the  first 
stage  of  sampling.  First -stage  sample  units  are  generally  admin- 
istrative units,  census  enumeration  areas  or  geographic  units  for 
which  the  population  is  known  but  which  is  too  large  to  cover  in 
its  entirety- 

Intraclass  Correlation;  The  similarity  between  elements  in  the 
population  that  are  clustered  together.  Usually,  elements  in 
close  proximity  to  each  other  are  more  similar  than  elements  which 
are  more  dispersed.  This  reduces  the  precision  of  a  sample  of  a 
given  size. 


B-1 


Non— Sampling     Errors:      ErrorB  that.   reEuit    from  sourres  other  ttian 

?i  a  rri  i:.;<  .1  i  ri  g  .      N  cj  n  s:-  a  m  |3  J.  i  n  q     err  (3  r    i  n  c  i  u  d  e     e r  t'"  c:?  i-    i  n  t  r  r:<  d  i.i  c  e  d     b  y       u  c:  h 

thiriQE-  as  poorly  supervised  field  stafr,,  poorly  worded  questions, 
e  r  r  o  r  s  i  n  d  a  t.  a  r  e  c:<  r"  d  i  n  q  ,  e  d  :i  t  :i  n  q  a  n  d  a  a  1  y  i  s  ,  r-  Ei  s-  p  o  n  d  e  n  t  fatigue 
or   d e c:  e p  t  .t.  on      a n d   s  i  rrt  j.  1  a r   (T! a  1 1  e r  s  .. 

Precision:  The  e>;tent  to  which  a  sample  reproduces  the  results 
that  would  be  obtained  if  we  took  a  complete  census  using  the  same 
methods  of   measurement,    etc  „ 

Primary  Sample  Unit:    See  first -stage  sample  unit„ 

Proportional  Allocation:  The  process  of  distributing  a  sample 
across  subgroups  of  a  population  in  proportion  to  the  number  of 
samp 1 e  el emen t s  in  the  sub q r oup . 

Random  Number:  A  number  selected  in  such  a  way  that  ail  numbers  in 
a  specified  range  have  an  equal  probability  of  be?ing  selected. 
Normally  we  use  a  computer  generated  random  number  table  to  select 
random  numbers  when  the  sample  population   is  large. 

Sample  Elements:    The  ultimate  units  from  which  we  collect  data. 

Sample  Frame:  A  list  of  sample  elements  or  sample  units  containing 
the  sample  eiements- 

Sample  Units:  The  entities  which  we  select  in  a  sample  at  any 
stage.      Only  the   last   stage  sample  units  will    be  sample  elements. 

Sampling  Error;  The  random  or  chance  error  that  occurs  when  we 
take  a  sample  rather  than  a  census  of  the  whole  population.  Sam- 
pling  error   can   be  reduced   by   increasing   sample  size. 

Sel -f  — Wei ghti ng  Sample:  A  sample  that  is  designed  in  such  a  way 
that  the  unweighted  mean  of  the  sample  gives  an  unbiased  estimate 
of   the  population  mean. 

Serpentine  Listing:  A  process  of  listing  sample  units  that  criss- 
crosses an  area  in  such  a  way  as  to  ensure  a  good  dispersion  of 
sample  units  across  space  when  utilizing  a  systematic  sampling  in- 
terval , 

Simple  Random  Sample:  A  sample  in  which  all  sample  elements  have 
an  equal  probability  of  being  selected  and  are  drawn  directly  -from 
the  sample  population  without  clustering  or  utilizing  more  than 
one  stage  of  sampling. 

Standard  Deviation:  A  measure  of  the  dispersion  of  population 
(sample)  elements  about  their  mean.  The  standard  deviation  is  the 
square  root   of   the  variance- 


B-2 


standard  Error:  Ihe  standard  dG'Viat.:iQn  of  ttie  sample  estimate  di- 
vided by  the  square  root  of  the  samp],  e  size.  The  standard  error 
m  e  a  s  i r  e  s  t  h  e  e  x  t  e  n  t  t  •'■:<  w  ti  i  c.  h  i  t.  hi  e  s  a  m  pie  e  v:r  t  i  m  ate  cJ  i  -f  f  e  r  s  f  r  o  m  t  h  e 
true  population  estimate.  it  app^roaches  zero  as  the  size  of  the 
a  (71  p  1  e   a  p  p  r  o  a  c  h  e  s  t.  h  e  pop  u  1  a  1 1  o  n   s  i  z  e . 

Stratified  Sample:  A  sample  -from  a  population  that  has  first  been 
divided  into  groups  of  known  size.  Stratifying  populations  ac- 
cording to  their  similarity  with  respect  to  the  things  being  mea- 
sured usually  increases  the  precision  of  sample  estimates  for  a 
given   size  of  sample. 

Stratum:  A  group  of  population  elements  that  share  similar  charac- 
teristics with   respect   to  the  objectives  of    a   sample  survey. 

Systematic  Sampling:  A  method  of  sampling  in  which  a  fixed  sam- 
pling interval  is  applied  to  a  list  of  population  units  in  order 
to  identify  the  specific  units  to  be  included  in  the  sample.  The 
fixed  sampling  interval  is  called  the  systematic  sampling  inter— 
V  a  1  . 

Two— Stage  (Multistage)  Sample:  A  sample  drawn  in  more  than  one 
stage.  Normally  all  but  the  last-stage  sample  units  are  made  up 
of  groups  of  sample  elements  that  are  too  numerous  to  survey  but 
for  which  the  population  is  known.  This  reduces  the  number  of 
households  that  ultimately  have  to  be  listed  in  order  to  draw  the 
sample  elements  to  be  interviewed.  Multistage  sampling  usually 
reduces  survey  costs. 

Ultimate  Cluster:  The  lowest  cluster  or  sample  unit  from  which 
sample  elements  are  selected. 

Variance:      A  measure  of   the  dispersion   of   population    (sample)  el- 
ements about   their   mean.      The  variance   is  a  measure  of   the  squared 
difference  of   the  population    (sample)    elements  from  the  population 
(sample)  mean. 

Weight  (extrapolation  weight,  extrapolation  factor,  expansion  fac- 
tor): A  number  attached  to  a  sample  observation  equal  to  the  in- 
verse of  that,  sample  element  s  probability  of  selection.  Ihis 
number,  relative  to  the  total  of  all  weights,  reflects  the  share 
of  the  population  total  represented  by  the  sample  element  to  which 
it   is  attached. 


NATIONAL  AGRICULTURAL  LIBRARY 


1022247623 


/ 


